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Abstract 



Particle splitting methods are considered for the estimation of rare 
events. The probability of interest is that a Markov process first enters 
a set B before another set A, and it is assumed that this probability 
satisfies a large deviation scaling. A notion of subsolution is defined 
for the related calculus of variations problem, and two main results 
^ ' are proved under mild conditions. The first is that the number of par- 

C^^ . tides generated by the algorithm grows subexponentially if and only 

L^ ' if a certain scalar multiple of the importance function is a subsolu- 

tion. The second is that, under the same condition, the variance of 
the algorithm is characterized (asymptotically) in terms of the subso- 
lution. The design of asymptotically optimal schemes is discussed, and 
r~^ ' numerical examples are presented. 

o' 



1 Introduction 

The numerical estimation of probabilities of rare events is a difficult problem. 
There are many potential applications in operations research and engineer- 
ing, insurance, finance, chemistry, biology, and elsewhere, and many papers 
(and by now even a few books) have proposed numerical schemes for partic- 
ular settings and applications. Because the quantity of interest is very small, 
standard Monte Carlo simulation requires an enormous number of samples 
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for the variance of the resulting estimate to be comparable to the unknown 
probability. It quickly becomes unusable, and more efficient alternatives are 
sought. 

The two most widely considered alternatives are those based on change- 
of-measure techniques and those based on branching processes. The former 
is usually called importance sampling, and the latter is often referred to as 
multi-level splitting. While good results on a variety of problem formulations 
have been reported for both methods, it is also true that both methods can 
produce inaccurate and misleading results. The design issue is critical, and 
one can argue that the proper theoretical tools for the design of importance 
sampling and splitting algorithms were simply not available for complicated 
models and problem formulations. [An alternative approach based on inter- 
acting particles has also been suggested as in [3]. However, we are unaware 
of any analysis of the performance of these schemes as the probability of 
interest becomes small.] 

Suppose that the probability of interest takes the form p = P {Z £ G} = 
/i(G), where G is a subset of some reasonably regular space (e.g., a Polish 
space S) and fi a probability measure. In ordinary Monte Carlo one gener- 
ates a number of independent and identically distributed (iid) samples {Zi} 
from //, and then estimates p using the sample mean of l^ZteG}- ^^ the 
case of importance sampling, one uses an alternative sampling distribution 
u, generates iid samples {-^i} from u, and then estimates via the sample 
mean of [dfj,/di^] (Zj)lr^ ^^1. The Radon-Nikodim derivative [dfi/du] (Zi) 

guarantees that the estimate is unbiased. The goal is to choose z^ so that 
the individual samples [dfi/du] (Zj)lr^^^i cluster tightly around p, thereby 
reducing the variance. However, for complicated process models or events 
G the selection of a good measure v may not be simple. The papers [10, 11] 
show how certain standard heuristic methods based on ideas from large de- 
viations could produce very poor results. The difficulty is due to points in S 
with low probability under v for which dji/du is very large. The aforemen- 
tioned large deviation heuristic does not properly account for the contribu- 
tion of these points to the variance of the estimate, and it is not hard to find 
examples where the corresponding importance sampling estimator is much 
worse than even ordinary Monte Carlo. The estimates exhibit very inaccu- 
rate and/or unstable behavior, though the instability may not be evident 
from numerical data until massive amounts have been generated. 

The most discussed application of splitting type schemes is to first en- 
trance probabilities, and to continue the discussion we specialize to that 
case. Thus Z is the sample path of a stationary stochastic process {Xi} 



(which for simphcity is taken to be Markovian), and G is the set of trajec- 
tories that first enter a set B prior to entering a set A. More precisely, for 
disjoint B and A and x ^ AU B, 

p = p{x) = P{Xj eB,Xi ^A,ie{0,...,j},j < cx)\Xo = x}. 

In the most simple version of splitting, the state space is partitioned accord- 
ing to certain sets B C Cq C Ci C ■ ■ ■ C Ck, with x ^ Ck and A n Ck = 0- 
These sets are often defined as level sets of a particular function V, which 
is commonly called an importance function. Particles are generated and 
killed off according to the following rules. A single particle is started at 
X. Generation of particles (splitting) occurs whenever an existing particle 
reaches a threshold or level Ci for the first time. At that time, a (possibly 
random) number of new particles are placed at the location of entrance into 
Cj. The future evolutions of these particles are independent of each other 
(and all other particles), and follow the law of {^i}- Particles are killed if 
they enter A before B. Attached to each particle is a weight. Whenever a 
particle splits the weight of each descendent equals that of the parent times 
a discount factor. A random tree is thereby produced, with each leaf corre- 
sponding to a particle that has either reached B or been killed. A random 
variable (roughly analogous to a single sample [dfi/di'] {Zi)\(2aG\ ^om the 
importance sampling approach) is defined as the sum of the weights for all 
particles that make it to B. The rule that updates the weights when a par- 
ticle splits is chosen so that the expected value of this random variable is 
p. This numerical experiment is independently repeated a number of times, 
and the sample mean is again used to estimate p. 

There are two potential sources of poor behavior in the splitting algo- 
rithm. The first and most troubling is that the number of particles may be 
large. For example, the number could be comparable 9 for some 9 > 1. In 
settings where a large deviation characterization of p is available, the num- 
ber of levels itself usually grows with the large deviation parameter, and so 
the number of particles could grow exponentially. We will refer to this as 
instability of the algorithm. For obvious computational reasons, instability 
is something to be avoided. The other source of poor behavior is analogous 
to that of importance sampling (and ordinary Monte Carlo), which is high 
relative variance of the estimate. If the weighting rule leads to high varia- 
tion of the weights of particles that make it to -B, or if too many simulations 
produce no particles that make it to B (in which case a zero is averaged in 
the sample mean), then high relative variance is likely. Note, however, that 
this problem has a bounded potential for mischief, since the weights cannot 



be larger than one. Such a bound does not hold for the Radon-Nikodim 
derivative of importance sampling. 

When the probability of interest can be approximated via large devia- 
tions, the rate of decay is described in terms of a variational problem, such 
as a calculus of variations or optimal control problem. It is well known 
that problems of this sort are closely related to a family of nonlinear par- 
tial differential equations (PDE) known as Hamilton-Jacobi-Bellman (HJB) 
equations. In a pair of recent papers [4, 6], it was shown how subsolutions 
of the HJB equations associated with a variety of rare event problems could 
be used to construct and rigorously analyze efficient importance sampling 
schemes. In fact, the subsolution property turns out to be in some sense 
necessary and sufficient, in that efficient schemes can be shown to imply the 
existence of an associated subsolution. 

The purpose of the present paper is to show that in certain circumstances 
a remarkably similar result holds for splitting algorithms. More precisely, 
we will show the following under relatively mild conditions. 

• A necessary and sufficient condition for the stability of the splitting 
scheme associated to a given importance function is that a certain 
scalar multiple of the importance function be a subsolution of the 
related HJB equation. The multiplier is the ratio of the logarithm of 
the expected number of offspring for each split and the gap between 
the levels. 

• If the subsolution property is satisfied, then the variance of the split- 
ting scheme decays exponentially with a rate defined in terms of the 
value of the subsolution at a certain point. 

• As in the case of importance sampling, when a subsolution has the 
maximum possible value at this point (which is the value of the cor- 
responding solution), the scheme is in some sense asymptotically op- 
timal. 

These results are significant for several reasons. The most obvious is that 
a splitting algorithm is probably not useful if it is not stable, and the subso- 
lution property provides a way of checking stability. A second is that good, 
suboptimal schemes can be constructed and compared via the subsolutions 
framework. A third reason is that for interesting classes of problems it is 
possible to construct subsolutions that correspond to asymptotically optimal 
algorithms (see [4, 6]). Subsolutions can be much easier to construct than 
solutions. In this context it is worth noting that the type of subsolution 



required for splitting (a viscosity subsolution [1, 7]) is less restrictive that 
the type of subsolution required for importance sampling. Further remarks 
on this point will be given in Section 5. 

An outline of the paper is as follows. In the next section we describe 
the probabilities to be approximated, state assumptions, and formulate the 
splitting algorithm. This section also presents a closely related algorithm 
that will be used in the analysis. Section 3 studies the stability problem, 
and Section 4 shows how to bound the variance of an estimator in terms 
of the related subsolution. The results of Sections 3 and 4 can be phrased 
directly in terms of the solution to the calculus of variations problem that 
is related to the large deviation asymptotics. However, for the purposes 
of practical construction of importance functions the characterization via 
subsolutions of a PDE is more useful. These issues are discussed in Section 
5, and examples and numerical examples are presented in the concluding 
Section 6. 

Acknowledgment. Our interest in the parallels between importance sam- 
pling and multi-level splitting was stimulated by a talk given by P.T. de 
Boer at the RESIM conference in Bamberg, Germany [2]. 

2 Problem Formulation 

2.1 Problem Setting and Large Deviation Properties 

A domain Z? C M is given and also a sequence of discrete time, stationary, 
Markov D— valued processes {AT"}. Disjoint sets A and B are given, and 
we set r" = min {i : X^ G ^ U B}. The probability of interest is then 

The varying initial conditions are used for greater generality, but also be- 
cause initial conditions for the prelimit processes may be restricted to some 
subset of D. The analogous continuous time framework can also be used 
with analogous assumptions and results. For a given point x ^ Au B, we 
make the following large deviation-type assumption. 

Condition 1 For any sequence Xn — > x, 

lim logp^^{xn) = W{x), 

n~*co n 



where W{x) is the solution to a control problem of the form 

inf / LU{s),^{s)]ds. 

Here L : R'^ x R'^ —^ [0, oo], and the infimum is taken over all absolutely 
continuous functions (f) with (/>(0) = x, (j){t) € B,(j){s) ^ A for all s € [0,t] 
and some t < oo. 

Remark 2 The assumption that {X"} be Markovian is not necessary for 
the proofs to follow. For example, it could be the case that X" is the first 
component of a Markov process (X",!^") (e.g., so-called Markov-modulated 
processes). In such a case it is enough that the analogous large deviation 
limit hold uniformly in all possible initial conditions Yq, and indeed the 
proofs given below will carry over with only notational changes. This can 
be further weakened, e.g., it is enough that the estimates hold uniformly 
with sufficiently high probability in the conditioning data. However, the 
construction of subsolutions will be more difficult, since the PDE discussed in 
Section 5 is no longer available in explicit form. See [6] for further discussion 
on this point. 

It is useful to say a few words on how one can verify conditions like Con- 
dition 1 from existing large deviation results. Similar but slightly different 
assumptions will be made in various places in the sequel, and in all cases 
analogous remarks will apply. 

For discrete time processes one often finds process-level large deviation 
properties phrased in terms of a continuous time interpolation X^{t), with 
X"'{i/n) = X^ and X"(i) defined by piecewise linear interpolation for t 
not of the form t = i/n. In precise terms, process-level large deviation 
asymptotics hold for {X"} if the following upper and lower bounds hold for 
each T G (0, cx)) and any sequence of initial conditions Xn & D with Xn -^ x. 
Define 

Il{<t>) = l L(^(t>{s),4>{s))ds 

if (p is absolutely continuous with (p{0) = x, and l'^{(j)) = cxd otherwise. If F 
is any closed subset of C([0, T] : D) then the upper bound 

limsup - log P {X" G F |X"(0) = x„ } < - inf /J((/)) 

holds, and if O is any open subset of C([0, T] : Z?) then the lower bound 

liminf - log P {X" G O |X"(0) = x^ } > - inf /J((/)) 

n— >oo n 4>£0 



holds. It is also usual to assume that for each fixed x,T, and any M < oo, 
the set 

{<^eC([O,r]:i^):/J(0)<M} 

is compact in C([0, T] : D). The zero-cost trajectories (i.e., paths 4' ^oi 
which lj{(l)) = 0) are particularly significant in that all other paths are in 
some sense exponentially unlikely. 

With regard to Condition 1, two different types of additional conditions 
beyond the sample path large deviation principle are required. One is a con- 
dition that allows a reduction to large deviation properties over a finite time 
interval. For example, suppose that there is T such that if 4> enters neither 
A nor B before T, then Ix{(t>) > W{x) + 1. In this case, the contribution to 
p"(x„) from sample paths that take longer than T is negligible, and can be 
ignored. This allows an application of the finite time large deviation prin- 
ciple. Now let G be the set of trajectories that enter B at some time t < T 
without having previously entered A. By the first condition, the asymptotic 
rates of decay of p^{xn) and P {X^ € G |X"(0) = x„} are the same. The 
second type of condition is to impose enough regularity on the sets A and B 
and the rate function l'^{4)) that the infimum over the interior and closure 
of G are the same. These points are discussed at length in the literature on 
large deviations [8]. 

Example 3 Assume the following conditions: L{-, •) is lower seniicontinu- 
ous; for each x G D,L{x,-) is convex; L{x,-) is uniformly superlinear; for 
each X G D there is a unique point b{x) for which L{x, b{x)) = 0; b is Lips- 
chitz continuous, and all solutions to <p = blcp) are attracted to 9 G A, with A 
open. Let T> C D be a bounded domain that contains A and B, and assume 
(6(x),n(x)) < for x S dD, where n{x) is the outward normal to T) at x. 
Suppose that the cost to go from x to any point in dT> is at least W{x) + 1. 
Then T as described above exists. 

2.2 The Splitting Algorithm 

In order to define a spitting algorithm we need to choose an importance 
function V{y) and a level size A > 0. We will require that V{y) be continu- 
ous and that V{y) < for all y € B. Later on we will relate V to the value 
function W, and discuss why subsolutions to the PDE that is satisfied by 
W are closely related to natural candidates for the importance function. 

To simplify the presentation, we consider only splitting mechanisms with 
an a priori bound i? < oo on the maximum number of offspring. The 
restriction is convenient for the analysis, and as we will see is without loss of 



generality. The set of (deterministic) splitting mechanisms will be indexed 
by j G {1, • • • , J}- Given that mechanism j has been selected, r{j) particles 
(with \r{j)\ < R) are generated and weights w{j) € M^ are assigned to 

the particles. Note that we do not assume Yll=i '^iU) — 1- The class of all 
splitting mechanisms (i.e., including randomized mechanisms) is identified 
with the set of all probability distributions on {1, . . . , J}. 
Associated with V are the level sets 

L, = {yeD: V{y) < z}. 

A key technical condition we use is the following. In the condition. Ex 
denotes expected value given Xq = x. 

Condition 4 Let z € [0, l^(x)] be given and define o"" = min {i : X" E AU L^}. 
Then 

l\mmf--logEx\lr^Up^{X:,.)f]>W{x)+ inf W{y). 

n-^oo n I I^ct"^-^^/ j yGdLz 

Under the conditions discussed after Condition 1 which allow one to con- 
sider bounded time intervals, Condition 4 follows from the Markov property 
and the large deviation upper bound. 

We also define collections of sets < Cq = B, C" = -^(j_i)A/n) J = li • • • f • 
Define the level function /" by /"(y) = min{j > : y € C*^}. The location of 
the starting point corresponds to l'"{x) = \nV{x)/A'\, and P = indicates 
entry into the target set B. The splitting algorithm associated with a partic- 
ular distribution q will now be defined. Although the algorithm depends on 
V,q,r,w,Xn,A,A. and B, to minimize notational clutter these dependencies 
are not explicitly denoted. 

Splitting Algorithm (SA) 

Variables: 

A'^^" number of particles in generation r 

X^f^ position of k particle in generation r 

w^f, weight of /c*'* particle in generation r 

Initialization Step: 

Xn > f^o 1 ^ -'- 



for r = 


1,... 


N^ = 


= 


end 





l^{x. 



Main Algorithm: 

for r = l,...,r(3;„) 

if n:^_^ = then N'^ = 
else 

for i = l,...,Af;_i 

generate Z"!^— a single sample of a process with the 
same law as Xf and initial condition Z^-q = X'!^_^- 

let r-^.=mf{.:Z«^.,GAuCr„(,„)_J 

Splitting Step begin 

if Z".^„ ^A 

let M be an independent sample from the law q 

for k = l,...,|r(M)| 

end 
end 
Splitting Step end 

end 

end 
end 
Construction of a sample: 

once all the generations have been calculated we form 

the quantity 

A'"" 
c" — Y^ l"-(^n) n 



It should be noted that generation 1 consists of all of the particles that 
make it to set CJl^f^^^ and more generally generation j consists of all par- 
ticles that make it to set Cp, ,^_ .. We also define generation to be the 
initial particle. 

An estimate p^ji^ixn) of p"(x„) is formed by averaging a number of inde- 
pendent samples of Sg^. Observe that once generation r has been calculated 
the information about generations to r — 1 can be discarded, and so there 
is no need to keep all the data in memory until completion of the algorithm. 



Also note that in practice there is no need to spht upon entering Cq = B. 




Figure 1: The Sets A and B and Level Sets of V. 



We first need to find conditions under which this splitting algorithm gives 
an unbiased estimator of p"'(x„). To simplify this and other calculations we 
introduce an auxiliary algorithm titled Splitting Algorithm Fully Branching 
(SFB). The essential difference between the two is that the process dynamics 
are redefined in A to make it absorbing, and that splitting continues even 
after a particle enters A. When the estimate is constructed we only count the 
particles which are in B in the last generation, so that the two estimates have 
the same distribution. The SFB algorithm is more convenient for purposes 
of analysis, because we do not distinguish those particles which have entered 
A from those which still have a chance to enter B. Of course this algorithm 
would be terrible from a practical perspective-the total number of particles 
is certain to grow exponentially. However, the algorithm is used only for 
the purposes of theoretical analysis, and the number of particles is not a 
concern. Overbars are used to distinguish this algorithm from the previous 
one. 

Splitting Algorithm Fully Branching (SFB) 
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Variables : 

iV" number of particles in generation r 

X^f^ position of k particle in generation r 

w^f, weight of k particle in generation r 

Initialization Step: 

N^ = l, Xl^=Xn, ^^,1 = 1 

for _r = l,...,/"(x„) 

end 
Main Algorithm: 

for r = l,...,r(x„) 
for j = l,...,iV;„i 

generate Z"^— a single sample of a process with the 
same law as X" and initial condition Z"q = X"_^ • 

let fl^ = mm{i : 2^}^,^ e AU C" (^.^)_ J 

Splitting Step begin 

let M be an independent sample from the law q 

for k= l,...,|r(M)| 

vn _ __ yn 

end 

Splitting Step end 



end 
end 
Construction of a sample: 

once all the generations have been calculated we form 
the quantity 



„n 
*SFB 



. -' * jr) 



El" 
.7 = 1 



i"(a:„) 



i=i ^|x", , ,6Bj"^^"(^•n)J 



Since the distributions of the two estimates coincide 



Ex 







/V" 


>] <M, 


= Ex„ 


>: 


■'=^ 




i=i 






Because of this, the SFB algorithm can be used to prove the following. 
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Lemma 5 An estimator based on independent copies of Sg^ is unbiased if 
and only if 



E 



1. 



Proof. It suffices to prove 



-Err 






p'^ixn) 



We will use a particular construction of the SFB algorithm that is useful 
here and elsewhere in the paper. Recall that with this algorithm every 
particle splits at every generation. Hence the random number of particles 
associated with each splitting can be generated prior to the generation of 
any trajectories that will determine particle locations. As a consequence, the 
total number of particles present at the last generation can be calculated, as 
can the weight that will be assigned to each particle in this final generation, 
prior to the assignment of a trajectory to the particle. Once the weights 
have been assigned, the trajectories of all the particles can be constructed 
in terms of random variables that are independent of those used to construct 
the weights. Since the probability that any such trajectory makes it to B 
prior to hitting A is p'^{xn), 



E^, 



N, 



l"(xn) 



E 



1 






\^l"-(xn)J 



eB]^i"i^^)d 



p'^{Xn)E^^^ 






A simple proof by induction and the independence of the splitting from 
particle to particle shows that 



Err. 



N, 



rn 


= (= 


j = l 


V 



r(A/) 



i=l 



rixu) 



For the rest of the paper we restrict attention to splitting mechanisms 
that are unbiased. 
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3 Stability 

Now let an importance function V, level A, and splitting mechanism (g, r, w) 
be given. Define 

J{x,y)= inf [ LU{s),^{s))ds, (1) 

where the infimum is over absolutely continuous functions. A function W : 
£) ^ M will be called a subsolution if W{x) < for all x € -B and if 
W{x) - W{y) < J{x,y) for all x,y e D\{AUB). In Section 5 we will 
discuss conditions under which W can be identified as a viscosity subsolution 
for an associated PDE. Recall that a splitting algorithm is called stable if 
the total number of particles ever used grows subexponentially as n ^ oo. 
For a given splitting algorithm define 

- , , log Er(M) , , 
Wix) = ^^ V (x). (2) 

In this section we show that, loosely speaking, a splitting algorithm is stable 

if and only if P^ is a subsolution. 

A construction that will simplify some of the proofs is to replace a 

given splitting mechanism by one for which all the weights are constant. 

Thus {q,r,w) is replaced by {q,r,w), where for each j = 1,...,J and 

i = ^,---,r{j), 

J 

[m{j)]-'=Er{M)=Y,r{j)qr 

j=i 

The new splitting mechanism is also unbiased, and the distribution of the 
number of particles at each stage is the same as that of {q, r, w). 

To establish the instability we make a very mild assumption on a large 
deviation lower bound for the probability that an open ball is hit prior to 
reaching A. This assumption can be expected to hold under conditions 
which guarantee Condition 1. 

Proposition 6 Consider an importance function V , level A, and split- 
ting mechanism {q,r,w), and define W by (2). Suppose there exists y G 
D\ (A U B) such that W{y) > and 

W{x)-W{y)>J{x,y). (3) 
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Assume that J{x,y) is continuous at y. Let p^{xn) he the probability that 
X^ enters the ball of radius 6 > about y before entering A, given Xq = Xn, 
and assume 



liniinf — logp"(2;„) > 

n-->oo n 



inf J{x,z). 

z:\z-y\<5 



Then the corresponding splitting algorithm is not stable. 

Proof. It is enough to prove the instabiUty of the algorithm that uses 
{q,r,w). Since J'{x,y) > 0, V{y) < V{x). From the definition of W in (2) 
and (3) there exist 5 > and e > such that for all z with |y — z| < 6, 

, , , , ,, log £^r(M) 

[Vix) - Viz)] ^ ^^ ' > J{x, z) + e. 

Let S = {z ■.\y — z\ < 5}. By taking 5 > smaller if necessary we can 
guarantee that S" PI j4 = and V{z) > for all z E 5. 




Figure 2: Level Sets of V in Proof of Instability. 



Suppose one were to consider the problem of estimating p^[xn)- One 
could use the same splitting mechanism and level sets, and even the same 
random variables, except that one would stop on hitting A oi S rather 
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than A or B, and the last stage would correspond to some number m" < 
l^ixn)- Of course, since V is positive on S it will no longer work as an 
importance function, but there is a function V = V — a that will induce 
exactly the same level sets as V and which can serve as an importance 
function for this problem. See Figure 2. The two problems can be coupled, 
in that exactly the same random variables can be used to construct both 
the splitting mechanisms and particle trajectories up until the particles in 
the p^{xn) problem enter C^. 

If a particular particle has not been trapped in A prior to entering Cf, 
then that particle would also not yet be trapped in A in the corresponding 
scheme used to estimate p"(x„). Note also that the number of particles that 
make it to 5" = Cq are at most R times the number that make it to C". 
Let N^n denote the number of particles that make it to S in the SA used 
to approximate p"'(x„), and let iV^.^ -. be the number used in the SA that 

approximates p'^{xn)- Then NJ\, ■. > N^„/R. 

Using the SFB variant in the same way that it was used in the proof of 
Lemma 5 and that the mechanism (q, r, id) is used, 



p'^iXn) 



Exr, 



E^ 



N" 



E 



w„ 



rn"-,] 



m"' 

E {Er{M)y 



i=i 



[Er{M)Y 



-Err 






We now use the lower bound on p'^{xn) and that m^/n -^ [V{x) — sup^g^ ^i^)] /A: 



1 



AT" 



liminf-log^^„ 

n— >oo n 

= liminf - log E,„ Ip'^ixn) [Er{M)Y 

n—KX) n L 

> - inf J(x, .) + [^("^ - ^"^P-^^ ^("^] log [Er{M)] 



> inf 

zes 

> e. 



[V{x) - V{z)] 



\og[Er{M)]-J{x,z) 



It follows that 



lim inf — log E^^ 

n— >oo n 



/V" 



>e>0. 
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which completes the proof. ■ 

The next proposition considers stabihty. Here we will make a mild as- 
sumption concerning a large deviation upper bound, which can also be ex- 
pected to hold under conditions which guarantee Condition 1. 

Proposition 7 Consider an importance function V, level A, and splitting 
mechanism {q,r,w), and define W by (2). Suppose that 

W{x)-W{y)<J{x,y) 

for all x,y (^ D\{AUB) and that W{y) < for all y & B. Consider any 
a G [0, T^(x)], letpp{xn) he the probability that X^ enters level set La before 
entering A (given Xq = Xn), and assume 

limsup — logp"(x„) < — inf J{x^z). 

n — ^oo ri zGLa 

Then the corresponding splitting algorithm is stable. 

Proof. For each n let r" be the value in {1, ... , /"(x)} that maximizes r -^ 
Ex i^r]- Since r'^/n is bounded, along some subsequence (again denoted 
by n) we have r^/n ^ f € [0, y(x)/A]. Using the usual argument by 
contradiction, it is enough to prove 

limsup-log^:,JiV;5.] <0 

n^oo ri 

along this subsequence. First suppose that v = 0. Given 5 > 0, choose 
n < oo such that r'^/n < 6 for all n > n. Then N^^n^" < R , and so 
limsup„^oo ^logEx^ [N^n^"-] < 5 ■ logR. Since 5 > is arbitrary, this case 
is complete. 



Now assume v G {0,V{x)/A] and let 5 £ (0,v) be given. Suppose one 
were to consider the problem of estimating p"'(x„) as defined in the statement 
of the proposition, with a = V{x) — A{v — 6). We again use the same 
splitting mechanism and level sets, except that we now stop on hitting A or 
Lv(x)-A{v-S)- An importance function with these level sets can be found by 
adding a constant to V. We again couple the processes. Observe that entry 
into Cf for the p^{xn) problem corresponds to entry into d^^n in the p^{xn) 
problem for some m" such that m'^/n -^ [V{x)/A] — (v — 6) as n ^ cxd. 
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Figure 3: Level Sets of V in Proof of Stability. 



Note that particles generated upon entry into the set C^n will correspond 
to generation l^{x) — m"" of the algorithm. The final generation of the 
algorithm to estimate p^{xn) is generation /"(x) — m^ + 1, corresponding 
to the particles generated upon reaching Cq = -^\/(x)-A(t;-5)- Observe that 
since C^n_|_i C iy(x)-A(t;-5) every particle in the algorithm used to estimate 
P^{xn) that is not trapped in A by stage /""(x) — m" + 1 will make it to the 
target set -^y(a;)-A(D-5) iii the algorithm used to estimate p"(a;„). Hence 
iV;'i(^)_^n+i > iVr"(x)-m"+i' where iV,'^(^)_„„+i denotes the number of such 
particles for the SA used to estimate p"(a;„). 

We again use the SFB variant in the same way that it was used in the 
proof of Lemma 5 and the (g, r, w) splitting mechanism to obtain 



f{Xr^ 



[Er{M)] 



-(P(x)-m"+l) 



Err 



^^P{a;)-m"+l 



It follows from m"/n -^ \y{x)//\] — {v — 5) that (P(x) — m" + l)/n 



5]. 
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Using the upper bound on p^{xn)-, 



lim sup — log E. 

n— »oo n 

= lim sup — log E,^ 

n— >oo fl 

< - inf 






J{x,z) + [v-5]\og[Er{M)] 



V{x)-A{v-6) 



< sup 

zGL 



V{x)-A{v-5) 



A 



\og[Er{M)]-J{x,z) 



< 0. 



For sufficiently large n we have r"^ — {l"'{x) — m" + 1) < 25n/A, and hence 
iVA < iV,"„(,)_^„+i • «2'5"/A. It follows that 

limsup-log£;,„ [iV,"„] < (2VA) • logi?, 
and since (5 > is arbitrary the proof is complete. ■ 

4 Asymptotic Performance 

Since the sample Sg^ has mean p"(x„), any estimator constructed as an 
average of independent copies of Sg^ is unbiased and has variance propor- 
tional to var^;^ [^sa]- O^ce the mean is fixed, the minimization of var^;^ [^saI 
among splitting algorithms is equivalent to the minimization of Ex,^ [•SsaI ■ 
It is of course very difficult to find the minimizer in this problem. When a 
large deviation scaling holds, a useful alternative is to maximize the rate of 
decay of the second moment, i.e., to maximize 



lim inf log E^^ [ssa] ^ = li™ inf log E^^ 

n— »oo n n— >oo n 



■n: 



l"(x„) 



E 



w 



l"iXn),j 



By Jensen's inequality the best possible rate is 2W{x): 



l[mmi--\ogEx„ [4j,f > lim inf-- log ^,„ [sg^] > 2W{x) 

n—^cx) fi n— >oo tl 

The main result of this section is the following. 
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Theorem 8 Consider an importance function V , level A, and splitting 
mechanism {q,r,w), and define W by (2). Suppose that 

W{x)-W{y)<J{x,y) 

for all x,y £ D\ {A U B) and that W{y) < for all y £ B. Assume also 
that Conditions 1 and 4 hold. Then 

lim —\ogE,^ [sl^f = Wix) - Vix) ^ . 

Proof. It is sufficient to consider the SFB algorithm and prove that 

2 



lim log Ex 

n— >oo n 



E 1 



W{x) - V{x) 






A 



The proof is broken into upper and lower bounds. 
We first prove 



lim sup log Ex 



V" 

E 1 






(4) 



logf^Eif ^.W 



< W{x)-V{x)- 

In the following display we drop cross terms to obtain the inequality, and 
then use the same construction as in Lemma 5 under which the weights and 
trajectories are independent to obtain the equality. 

2 



lim sup log-E'x„ 

n— >oo n 



N, 



E 1 

j 



< lim sup log Ex 

n— »oo ri 



n 



limsup log I p'^{xn)Ex 

n— >oo n 



E 



w 



l''{Xu),j 
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Suppose we prove that for any k (and in particular k = P(xn)), that 



E^, 






r{M) 



(5) 



i=l 



Since l^{xn) = \nV{xn)/^~\, (4) will follow from Condition 1. The proof of 
(5) is by induction. Let Mj denote the independent random variables used 
to define the splitting for the jth particle at stage k. Then 



-Er 






Err. 



Err. 



i=l 



' N^ r{Mj) 

E«.)'E^^(^.)' 

EK 



r{M) 



We now turn to the proof of the lower bound 




A/"" 



liminf log£'x„ 

n— >oo Ti 



> W{x) - V{x) 



E ^/x" 

7^ I '" 



i^n),i 



eB 



}^P(x„ 



),i 



log(ii;Eif^^.(M)^) 



A 



(6) 



For each stage k, let MJ} , denote the independent random variables used in 
the splitting of particle j S {l, . . . , -/V"}. Also, let /" ■ denote the disjoint 
decomposition of the particles in {l, . . . ,iV"^_]^} according to their parent 
particle. Observe that if k,l G I^jik ^ I, then for all particles descended 
from k and /, k is the time of their last common ancestor. Given k G I"^-, 
^ denote the descendants of this particle at stage l"'{xn)- With 



let I; 



K+1,1"{X„) 
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this notation we can write 



-Et. 



E 1 






K=0 



TV" 



E E E 



I ~ 

1_ i"(a;„),m;, J 



1, 



™'i*=-',^+l,I"(a:„),i 



+ ^X 



E 1 






Let w" ■ denote the products of the weights accumulated by particle 
j G {l, . . . , iV^I up to stage k, and let 'w^_^_i i^u ) m denote the product of 

the weights accumulated by particle m G s 1, • • • ^^Jl, ^ > between stages 
K + 1 and the final stage. Finally, let TJ^ denote the sigma algebra generated 
by -M""-, s G {1, . . . , k} , J € {l, . . . , iV"} and the random variables used 
to construct X" • for these same indices. Note that the future weights are 
independent of JF", and that the distribution of X^, . ^ depends on TJ^ 
only through X^ ■ if /c G /"■ and rii G /", , ,„/ % , . We introduce the 
notation 






ryn 



m&r 



E 

I 



{^rn(.„),™eB}^«+M"(-n) 



21 



By conditioning on J^ we get 



E^, 



E E E 






(7) 



+ l,l'^(x„).k 



E 1 



^'£-^"+1,;" 



I. l"-(xn),mi J 



+ l,i"(a:„),i 



X77 




Xji, 


EK. E 



Using again the independence of the weights and trajectories as used in the 
proof of Lemma 5, we have 



E^^\zi,]=p^xi;). 



Since 



W = eY^ Wk{M)wi{M) = E 

the final expression in (7) equals 

'n: 
WE, 



Y,MM) 



E 



Y^MMf 






We conclude that 



E,, 



l"{xn) 

E 1 



^ \ l"{xn),3 J 



iXn),j 



l"{x„)-l 

K=0 



m 






+E, 



Af" 

l"(xn) 

E 1 
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For a final time we use that the weights and trajectories are independent, 
and also (5), to argue that for any bounded and measurable function F and 
any stage k, 



Err 






E.„[F{X:^,)]E,^ 



E 



r{M) 






^-. [F (^",i)] • 



Thus 



Ex„ 



N, 



E 1 






r(z„)-l / r(Af) 



K=0 



i=l 



r(M) 

E Y, MM)' 

i=l 



l'\x„) 



En- 



|^i"(:r„),l^^/. 



Since l"'{xn) is proportional to n, to prove (6) it is enough to show that 
if K„ is any sequence such that K„/n ^ u € [0, V{x)/ /\], then 



lim inf log Ex„ 



r{M) 

E Y, MM? I £;.„ 

1=1 



> W{x)-V{x)- 






A 



Observe that {XJ^^^^ ^ A} implies X;^^^i G C'Pny(x)/Al-«„- ^y Condition 4, 
^r>^cS^-^°Si^-„ [l{x„" .^AjP'^ (^:„,i)'] > W{x) + inf W^(y). 

n->oo n LI Kn.i^ / J V&oLv(x)-vA 

By the subsolution property, W{y) > V (y) log Er{M)/ A. By Holder's 
inequality 

r{M) r{M) 

E Y^ Wi{Mf ■ Er{M) > E ^ MM) = I, 

i=l i=l 
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and therefore 



r{M) 



log E ^ Wi{Mf < log{Er{M)) . 



Thus 



hni inf log Ex 

n—i-oo n 



i=l 



r{M) 



E Y, MM? E,^ ii{x:„,,^A}^'^ (^:„,i) 



i=l 



r(M) 



> -vlog I eY^ Wi{Mf + W{s 



inf '^^i^^Viy) 



i=l 

r{M) 



l/e9Lv(x)-i;A 



'log Le ^ Wi{Mf + W{x) + logEr(A/) 



j=i 



A 



> W{x)-V{x)- 



r(M) 



log^EilT^^^W 



and the proof is complete. 



4.1 Design of a Splitting Algorithm 

Suppose that V{x) and A are given and that we choose a splitting mechanism 
(g, r, w) which is unbiased and stable. By Theorem 8 the asymptotic rate of 
decay of the second moment is given by 



W{x)-V{x)- 



r(M) , 



\og[EY:,':V m{M) 

A 



Recall from the proof of the last theorem that 

/ r{M) \ 

- log Lb ^ Wi{Mf < log {Er{M)) . 

Further equality holds if and only if Wi{m) = ^Jj^^\ for alH € {!,..., r{m)} 
and all m G {1, . . . , J}. Given the value u = Er{M), an alternative splitting 
mechanism which is arguably the simplest which preserves the value and 
achieves the equality in Holder's inequality is that defined by J = 2 and 

Qi = \u] -u,q2 = l- qi, r(l) = \u] ,r(2) = [u\ , Wi{j) = l/u all i,j. (8) 
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Given a subsolution W, the design problem and the performance of the 
resulting algorithm can be summarized as follows. 

• Choose a level A and mean number of particles u, and define an im- 
portance function V by logu ■ V{x)/A = W{x). Define the splitting 
mechanism by (8). The resulting splitting algorithm will be stable. 



• 



If Sg^ is a single sample constructed according to this algorithm, then 
we have the asymptotic performance 

hm -l\ogE,J{s^^f] =Wix) + W{x). 

n—>oc n 

• The largest possible subsolution satisfies W{x) = W{x), in which case 
we achieve asymptotically optimal performance. 

Remark 9 Although the subsolution property guarantees stability, it could 
allow for polynomial growth of the number of particles. If in practice one 
observes that a large number of particles make it to B in the course of 
simulating a single sample s^^, then one can consider reducing the value of 
A slightly, while keeping the mechanism and V fixed. In PDE parlance, this 
corresponds to the use of what is called a strict subsolution. Because the 
value of W{x) is lowered slightly, there will be a slight increase in the second 
moment of the estimator. However, the strict inequality provides stronger 
control, and indeed the expected number of particles and moments of the 
number of particles will be bounded uniformly in n. 

5 The Associated Hamilton- Jacobi-Bellman Equa- 
tion 

The probability ^"(x) is intimately and naturally related, via the exponential 
rate W{x), with a certain nonlinear PDE. This relation is well known, and 
follows from the fact that W is characterized in terms of an optimal control 
or calculus of variations problem. We begin this section by defining the PDE 
and the notion of a subsolution in the PDE context. 

Our interest in this characterization is because it is more convenient for 
the explicit construction of subsolutions than the one based on the calculus 
of variations problem. See, for example, the subsolutions constructed for 
various large deviation problems in [6] and [4]. (It should be noted that the 
constructions in these papers ultimately produce classical subsolutions. In 
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contrast, the splitting algorithms require only the weaker viscosity subso- 
lution property. However, the smoother subsolutions constructed in [4, 6] 
are obtained as mollified versions of viscosity subsolutions, and it is the 
construction of these unmollified functions that is relevant to the present 
paper.) Other examples will be given in the next section. Since our only 
interest in the PDE is as a tool for explicit constructions, we describe the 
characterization formally and in the simplest possible setting, and refer the 
reader to [1, 7]. 
For qeW^, let 

M{x,q)= mi[{q,P)+L{x,p)]. 



Then under regularity conditions on L and the sets A and B, W can be 
characterized as the maximal viscosity subsolution to 

- ( Q r G BE 

n^,DW{x)) = Q,xiAuB, W{x) = }^^ ^^^^ . 

A continuous function M^ is a viscosity subsolution to this equation and 
boundary conditions if W{x) < for x e dB, W{x) < oo for x € dA and 
if the following condition holds. If </> : M — > M is a smooth test function 
such that the mapping x — > [W^(x) — (/'(x)] attains a maximum at xq G 
M'^\(^US), then m{xo,D(j){xo)) > 0. If W is smooth at xq then this 
implies M{xo,DW{xo)) > 0. 

Note that EI(x, •) is concave for each xq S W^, and hence the pointwise 
minimum of a collection of subsolutions is again a subsolution. It is this 
observation which makes the explicit construction of subsolutions feasible in 
a number of interesting problems (see [6]). 

In Section 2 we defined W{x) to be a subsolution to the calculus of 
variations problem if it satisfied the boundary inequalities and 

W{x)-W{y) <J{x,y) 

for all x,y G W^\{AU B). We now give the elementary proof that these 
notions coincide. Let W{x) — <j){x) attain a maximum at xq. Thus for any 
(3 £ W'- and all a £ (0, 1) sufficiently small, W{xo + a/3) - 0(xo + a/3) < 
W{xo) — (J){xq), and so 

(/.(xo) - (/)(xo + ap) < W{xo) - W{xo + a/3) 
< J{xo,xo + a(3). 
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Since J{x,y) is defined as an infimum over all trajectories that connect x 
to y, we always have 

J{xo,xo + al3) < / L{xo + sl3,l3)ds. 
Jo 

Hence if, e.g., the mapping x — > L{x,j3) is continuous, then for all (3 

(/>(xo) - (I){xq + a/3) < L{xo,P)a + o{a). 

Using Taylor's Theorem to expand (f), sending a | and then infimizing over 
P gives 

< inf^ [{D(l){xo),(3) + L(xo, /3)] < M{x, Dct){xo)). 



Thus W^ is a subsolution. 

The calculation just given does not show that W [the solution to the 
calculus of variations problem] is the maximal viscosity subsolution. Also, 
we have not shown that subsolutions to the PDE are also subsolutions in 
the calculus of variations sense. 

The characterization of W as the maximal viscosity subsolution requires 
that we establish W{x) < W{x) whenever W is a viscosity subsolution. A 
standard approach to this would be to show that given a viscosity subsolu- 
tion W, any point x ^ AU B, and any e > 0, there exists a smooth classical 
subsolution W^ such that W^{x) > W{x) — e. When this is true the classi- 
cal verification argument [7] can be used to show W'^{x) — W^{y) < J{x^ y), 
and since e > is arbitrary P^ is a subsolution to the calculus of variations 
problem. This implies W{x) > W{x). 

Invoking smooth subsolutions brings us very close to the method of con- 
structing nearly optimal importance sampling schemes as described in [4, 6], 
where the design of the scheme must be based on the smooth classical subso- 
lution W^{x) rather than W{x). In all the examples of the next subsection 
the inequality W{x) < W{x) can be established by constructing a nearby 
smooth subsolution as in [4, 6]. 

6 Numerical Examples 

In this section we present some numerical results. We study four prob- 
lems: buffer overflow for a tandem Jackson network with one shared buffer, 
simultaneous buffer overflow for a tandem Jackson network with separate 
buffers for each queue, some buffer overflow problems for a simple Markov 
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modulated queue and estimation of the sample mean of a sequence of i.i.d. 
random variables. 

For each case, we present an estimate based on a stated number of runs, 
standard errors and (formal) confidence intervals based on an empirical es- 
timate of the variance, and total computational time (intended only for 
comparing cases). We also present the maximum number of particles, which 
is the maximum over all runs of niaXr<in.(xn) ^r, as well as the empirical 
mean and standard deviation of max^<;n(j,^\ A^^. 

Subsolutions, even among those with the maximal value at a given point, 
are not unique, and indeed for the problems to be discussed there are some- 
times a number of reasonable choices one could make. We will not give 
any details of the proof of the subsolution property, but simply note that in 
each case it can be proved by a direct verification argument as discussed in 
Section 5. 

6.1 Tandem Jackson Network - Single Shared Buffer 

Consider a stable tandem Jackson network with service rates A < min{^^, /i2}- 
Suppose that the two queues share a single buffer and that we are interested 
in the probability 

p"' = -P{o,o) {total population reaches n before first return to (0,0) } 

It is well known that 

lim logp" = ioam{pi, P2} 

n— »oo n 

where pj = log^. Further, the (continuous time) Hamiltonian that corre- 
sponds to subsolutions of the relevant calculus of variations problem is 

H(p) = -[A(e-P^ - 1) + pi(e(Pi-P2) - 1) + ^2(6^' " !)]• 

(see [4] for the discrete time analogue) . Without loss of generality (see [4] ) 
one can assume that ^2 < /^i- By inspection M{p) = for p = — log ^{l, 1) 
(this root is suggested by the form of the escape region), and W{x) = 
{p, x) + log ^ is a subsolution which is in fact the solution and so leads 
to an asymptotically optimal splitting scheme. The table below shows the 
results of a splitting simulation with 20,000 runs for A = 1, //^ = /X2 = 4.5 
and for various values of n. 
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n 


30 


40 


50 


Theoretical Value 


2.63 X 10"^" 


1.03 X lO"^-' 


3.80 X lO"''^ 


Estimate 


2.67 X 10^^" 


1.06 X lO^-^"' 


3.71 X 10^"^ 


Std. Err. 


0.11 X 10"^" 


0.05 X 10"^* 


0.20 X 10"''^ 


95% C.I. 


[2.45,2.88] X lO"-"* 


[0.97, 1.16] X 10"^* 


[3.32,4.10 X lO"''^ 


Time Taken (s) 


21 


52 


95 


Average no. particles 


23 


31 


37 


S.D. no. particles 


103 


155 


220 


Max no. particles 


2877 


4803 


8369 



Table 1. A = 1, /i^ = /i2 = 4.5, asymptotically optimal scheme. 

It was noted in Remark 9 that the number of particles generated may 
grow subexponentially in n and this appears to be reflected in the data. 
Following the suggestion of the remark, we also considered a slightly subop- 
timal but strict subsolution in the hopes of better controlling the number of 
particles with little loss in performance. The table below shows the results 
of numerical simulation for the same problem with a splitting algorithm 
based on the subsolution 0.93 x W{x). Again each estimate is obtained us- 
ing 20,000 runs. The results are in accord with our expectations. 



n 


30 


40 


50 


Theoretical Value 


2.63 X 10^^" 


1.03 X 10^^* 


3.80 X 10^''^ 


Estimate 


2.72 X 10"^" 


1.08 X 10"^* 


3.67 X 10"''^ 


Std. Err. 


0.15 X 10"^" 


0.08 X lO"''* 


0.32 X 10"''^ 


95% C.I. 


[2.43, 3.02] X lO^-'" 


[0.93, 1.23 X 10^^* 


[3.04, 4.30] X 10^''^ 


Time Taken (s) 


5 


g 


11 


Average no. particles 


8 


8 


8 


S.D. no. particles 


26 


28 


28 


Max no. particles 


628 


794 


632 



Table 2. A = 1, //^ = ^2 = 4.5, asymptotically suboptimal scheme. 

6.2 Tandem Jackson Network - Separate Buffers 

In the paper [9] the authors address the problem of asymptotic optimality 
for splitting algorithms. In particular they consider an approach to choos- 
ing level sets that are claimed to be "consistent" with the large deviations 
analysis and show that this does not always lead to asymptotically opti- 
mal algorithms. They illustrate their results by considering, for a tandem 
Jackson network, the problem of simulating the probabilities 

p^ = -P(o,o) {both queues simultaneously exceed n before first return to (0,0)} 

It is shown that 

1 

logp" = p-^+ P2 = l, 



lim 

n— »oo 



1 



n 



and the authors propose a splitting algorithm based on the importance func- 
tion U{x) = 7 — 7min{3;i,a;2}, which is just a rescaling of the target set 
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B = {{x,y) : X > n or y > n}. They show that although the level sets given 
by this function may intuitively seem to agree with the most likely path to 
the rare set identified by the large deviations analysis, the resulting splitting 
algorithm in fact has very poor performance. By analyzing this importance 
function using the subsolution approach it is very easy to see why this is 
the case. The Hamiltonian corresponding to subsolutions is the same as in 
the previous section and it is clear to see that U{x) is not a subsolution. 
However, the function 



7 - Pi^i - P2X2 



W{x) 

is a subsolution. Further M^(0) = 7, thus the corresponding importance 
function will lead to an asymptotically optimal splitting algorithm. Nu- 
merical results are presented for the cases A = l,//^ = 3, /X2 = 2 and 
A = 1,/ii = 2,/i2 = 3 which are the same rates originally considered in 
[9]. Each estimate was obtained by a simulation using 20,000 runs. 



n 


10 


20 


30 


Theoretical Value 


9.64 X 10"" 


1.60 X 10"-"' 


2.64 X lO"^'' 


Estimate 


9.74 X Kr" 


1.58 X 10"-'=' 


2.66 X lO"^"* 


Std. Err. 


0.17 X 10"° 


0.03 X 10"-''' 


0.06 X 10"^" 


95% C.I. 


[9.41, 10.1 X 10"" 


[1.52, 1.65| X 10"^'" 


[2.54, 2.79 X lO"^'' 


Time Taken (s) 


25 


188 


639 


Average no. partieles 


25 


53 


81 


S.D. no. particles 


47 


123 


213 


Max no. particles 


550 


1690 


3130 



Table 3. A = 1,/^^ = 3, ;U2 = 2, asymptotically optimal scheme. 



n 


10 


20 


30 


Theoretical Value 


9.64 X 10"" 


1.60 X lO"-'^ 


2.64 X 10"^^ 


Estimate 


9.50 X 10"" 


1.56 X 10"-'=' 


2.68 X 10""^^ 


Std. Err. 


0.26 X 10"" 


0.06 X lO"-'^ 


0.13 X lO"^'' 


95% C.I. 


[8.99, 10.0] X 10"" 


[1.44, 1.68] X 10"^'' 


[2.43,2.94] X lO""'^ 


Time Taken (s) 


19 


136 


468 


Average no. particles 


26 


54 


86 


S.D. no. particles 


74 


222 


448 


Max no. particles 


1055 


4905 


11350 



Table 4. X = 1, fii = 2, fj,2 = 3, asymptotically optimal scheme. 

As expected, these results show a vast improvement over those obtained 
in [9]. Finally the tables below show the results of numerical simulation 
for the same problem with a splitting algorithm based on the subsolution 
0.95 X W{x). 



n 


10 


20 


30 


Theoretical Value 


9.64 X 10"" 


1.60 X lO"-'-"' 


2.64 X 10"'^'' 


Estimate 


9.51 X 10"" 


1.59 X 10"-'-'' 


2.78 X lO"^'' 


Std. Err. 


0.20 X 10"" 


0.05 X 10"-'" 


0.14 X lO"^"* 


95% C.I. 


[9.12, 9.91] X 10"" 


[1.48, 1.70] X lO"^'' 


[2.51, 3.05 X 10"'^'' 


Time Taken (s) 


12 


53 


98 


Average no. particles 


13 


18 


19 


S.D. no. particles 


24 


37 


41 


Max no. particles 


255 


416 


477 



30 



Table 5. A = 1,/i]^ = 3,/i2 = 2, asymptotically suboptimal scheme. 



n 


10 


20 


30 


Theoretical Value 


9.64 X 10^" 


1.60 X lO"-''' 


2.64 X lO"^'"' 


Estimate 


10.3 X 10"" 


1.52 X 10"-"' 


2.32 X lO"^'' 


Std. Err. 


0.32 X 10"" 


0.08 X 10"-'=' 


0.20 X 10""^^ 


95% C.I. 


[9.66, 10.9] X 10"" 


[1.35, 1.68] X 10"' = 


[1.92,2.72] X 10"^^ 


Time Taken (s) 


10 


37 


67 


Average no. particles 


15 


18 


19 


S.D. no. particles 


39 


61 


72 


Max no. particles 


636 


1266 


2060 



Table 6. A = l,/i| = 2,/i2 = 3, asymptotically suboptimal scheme. 

The choice of W{x) = 7 — PiXi — P2X2 as a subsolution may seem arbi- 
trary, however it turns out to be a very natural choice. Given a > con- 
sider a "nice" set B such that for the subsolution Wa{x) = a — p^xi — P2X2, 
Br^{x: Wc{x) > 0} = and Bn{x: Wa{x) = 0} / 0. Then 



lim 

n— »oo 



1 



n 



logp" = a, 



where 



p^ = P(o,o)(lueue reaches nB before first return to (0,0)). 

Intuitively this means that all points on a level set of the function Wa{x) 
have the same asymptotic probability. Thus given any such nice set B we 
can identify its large deviations rate by finding the unique a* such that 
Br\{x: Wa*{x) > 0} = and S n {x : Wc,*{x) = 0} ^ 0. Further Wa*{x) 
will lead to an asymptotically optimal scheme. 

That the family of functions Wa has such a property is because the sta- 
tionary probabilities for a stable tandem Jackson network have the product 
form 7r({i,j}) = (1 — Pi){'\. — P2)PiP2- Indeed, by using an argument based 
on the recurrence theorem, we can see that every stable tandem Jackson 
network has a family of affine subsolutions with the same property. Further 
this will be true for any iV-dimensional queueing network for which the sta- 
tionary probabilities vr have asymptotic product form, by which we mean 
that there exist pi, . . . , pj^ such that for any nice set B 

lim log7r(ni?) = infjxip]^ H \-xnPn ■ (^^ii • • ■ j^n) S B}. 

6.3 Non-Markovian Process 

Since many models are non-Markovian we present an example of splitting 
for a non-Markovian process. Consider a tandem network whose arrival 
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and service rates are modulated by an underlying process Mt which takes 
values in the set {1,2}, such that the times taken for the modulating pro- 
cess to switch states are independent exponential random variables with 
rate 7(1) if M is in state 1 and 7(2) otherwise. Let A(l),/i]^(l),/i2(l) and 
A(2),/i^(2),;U2(2) be the service rates of the queue in the first and second 
states respectively. It is known (see, e.g., [5]) that the Hamiltonian can be 
characterized in terms of the solution to an eigenvalue/eigenvector problem 
parameterized by p. This characterization is used for calculating the various 
roots to M.{p) = used below. 

Consider again the single shared buffer problem. Let A(l) = l,/i]^(l) = 
3.5,/X2(l) = 2.5,7(1) = 0.2 and A(2) = l,/ii(2) = 4.5,^2(2) = 4.5,7(2) = 
0.5. Using a verification argument, one can show that W{x) = 1.00029(1 — 
^1 — X2) is a subsolution with the maximal value T^(0). Thus using W{x) 
leads to an asymptotically optimal splitting scheme. The results of simula- 
tions run using this importance function are shown below, where again each 
estimate was derived using 20,000 runs. 



n 


30 


40 


50 


Theoretical Value 


6.36 X 10-^-' 


2.88 X 10"-" 


1.30 X IQ-^^ 


Estimate 


6.66 X 10^^^ 


2.89 X 10^-" 


1.27 X 10^^^ 


Std. Err. 


0.23 X 10^^-' 


0.13 X 10^-" 


0.06 X 10^^^ 


95% C.I. 


[6.23,7.11] X lO"-'-^ 


[2.66,3.11] X IQ-^' 


[1.16, 1.38] X IQ-^^ 


Time Taken (s) 


8 


14 


20 


Average no. particles 


4 


5 


5 


S.D. no. particles 


10 


11 


13 


Max no. particles 


188 


214 


280 



Table 7. Markov-modulated network, total population overflow. 

It is also worth revisiting the separate buffers problem for the same 
queueing network. For the same arrival and service rates one can again use 
a verification argument to show that W{x) = 2.2771 — 1.29532;i — 0.9818x2 
leads to an asymptotically optimal splitting scheme. Results of a simulation 
using 20,000 runs are shown below. 



n 


10 


20 


30 


Theoretical Value 


8.36 X IQ-"' 


1.07 X lO"-"* 


1.39 X IQ-^" 


Estimate 


8.24 X 10^^" 


1.04 X 10^ -'■^ 


1.36 X 10-^" 


Std. Err. 


0.19 X 10^^" 


0.03 X 10" -'^ 


0.05 X 10^^" 


95% C.I. 


[7.85,8.62] X 10"-"' 


[0.98, 1.10] X IQ-^" 


[1.26, 1.45] X IQ-^" 


Time Taken (s) 


22 


150 


479 


Average no. particles 


31 


59 


89 


S.D. no. particles 


76 


182 


336 


Max no. particles 


1076 


3228 


7871 





Table 8. Markov-modulated network, simultaneous separate buffer 

overflow. 
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Finally we investigate what happens in this case if we use a strict sub- 
solution as importance function. The table below shows the results of a 
simulation using 20,000 runs based on the importance function 0.95 x W. 



n 


10 


20 


30 


Theoretical Value 


8.36 X !()-"' 


1.07 X 10" -'■J 


1.39 X 10"^" 


Estimate 


8.28 X 10""' 


1.06 X lO"-'" 


1.43 X 10"^" 


Std. Err. 


0.21 X 10"^" 


0.04 X lO"-'"^ 


0.07 X 10"^" 


95% C.I. 


[7.87,8.69] X 10"-"' 


[0.98, 1.14] X lO""' 


[1.29, 1.56] X 10"^" 


Time Taken (s) 


15 


70 


157 


Average no. particles 


22 


31 


36 


S.D. no. particles 


50 


8S 


116 


Max no. particles 


707 


1469 


3021 



Table 9. Markov-modulated network, asymptotically suboptimal scheme. 

6.4 Rare Events for the Sample Mean 

It is also worth noting that this approach works just as well for finite 
time problems. Assume that Xi,X2,... is a sequence of i.i.d. N(0,/^) 
random variables where l'^ is the A^-dimensional identity matrix and let 
^n = ^ Sr=i -^i- Suppose that we are interested in simulating the sequence 
of probabilities 



P 



P{Sn(^C} 



for some set C such that C does not include the origin. For j € {1, . . . , n} 
let Sn{j) = ^ X]i=i Xi. Then given sequences Xn, jn and x G M^, t G [0, 1] 
such that liuin^ooXn = X and lim„^oo Jn/^ = t, the large deviations result 

lim --logP{Sn G C\SniJn) = X„} = W{x,t) 
n^oo 77, 

holds. Further the PDE corresponding to solutions of the calculus of varia- 
tions problem is (see [6]) 

Wt + miM{DW;p) = 0, 

where H(s;/?) = {s,l3) + L{(3) and L(/3) = ||/3f/2. We can put this into 
the general framework in the standard way, i.e., by considering the time 
variable as simply another state variable. The set B, for example, is then 
C X {!}. Strictly speaking this problem does not satisfy the conditions used 
previously, since the sets A and B no longer have disjoint closure. Although 
we omit the details, it is not difficult to work around this problem. 
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It is easy to see that any affine function of the form 



W{x,t) 



{a,x) 



\a\ 



[l-t)H{a), 



where H{a) = ||a||"/2, is a subsolution, though it may not have the optimal 
value at (0, 0) and may not be less than or equal to zero on B. We can use 
the fact that the minimum of a collection of subsolutions is also a subsolution 
to build a subsolution which satisfies the boundary condition and has the 
maximal value at (0,0). For example, suppose that C = {x S M^ : {pi,x) > 
1} U {x G M2 : {p2,x) > 1} where pi = (0.6,0.8) and pa = (0.6,-0.8). 
Let Wi{x) = 1 - {pi,x) - i(l - t), W2ix) = 1 - {p2,x) - i(l - t). Then 
W = W1AW2 is a subsolution and in fact provides an asymptotically optimal 
splitting scheme since M^(0, 0) = VF(0, 0). Numerical results are shown 
below. Each estimate was derived using 100,000 runs. In contrast to all 
the previous examples where the process evolves on a grid, the simulated 
process in this case may cross more than one splitting threshold in a single 
discrete time step. This appears to increase the variance somewhat (at least 
if the straightforward implementation as described in Section 2 is used), 
and hence we increased the number of runs to keep the relative variances 
comparable. 



n 


20 


30 


40 


Theoretical Value 


7.75 X K)-" 


4.33 X lO"" 


2.54 X 10""' 


Estimate 


7.65 X 10^'= 


4.22 X 10"" 


2.60 X 10"^" 


Std. Err. 


0.15 X 10"'= 


0.10 X 10"" 


0.07 X 10"^" 


95% C.I. 


[7.37,7.94] X lO"" 


[4.03,4.42] X 10"" 


[2.47,2.74] X 10"-"^ 


Time Taken (s) 


5 


10 


18 


Average no. partieles 


2 


2 


2 


S.D. no. particles 


2 


3 


3 


Max no. particles 


70 


61 


80 



Table 10. Sample mean for sums of iid. 
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Abstract 

Particle splitting methods are considered for the estimation of rare 
events. The probabiUty of interest is that a Markov process first enters 
a set B before another set A, and it is assumed that this probabiUty 
satisfies a large deviation scaling. A notion of subsolution is defined 
for the related calculus of variations problem, and two main results 
are proved under mild conditions. The first is that the number of par- 
ticles generated by the algorithm grows subexponentially if and only 
if a certain scalar multiple of the importance function is a subsolu- 
tion. The second is that, under the same condition, the variance of 
the algorithm is characterized (asymptotically) in terms of the subso- 
lution. The design of asymptotically optimal schemes is discussed, and 
numerical examples are presented. 

1 Introduction 

The numerical estimation of probabilities of rare events is a difficult problem. 
There are many potential applications in operations research and engineer- 
ing, insurance, finance, chemistry, biology, and elsewhere, and many papers 
(and by now even a few books) have proposed numerical schemes for partic- 
ular settings and applications. Because the quantity of interest is very small, 
standard Monte Carlo simulation requires an enormous number of samples 
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for the variance of the resulting estimate to be comparable to the unknown 
probability. It quickly becomes unusable, and more efficient alternatives are 
sought. 

The two most widely considered alternatives are those based on change- 
of-measure techniques and those based on branching processes. The former 
is usually called importance sampling, and the latter is often referred to as 
multi-level splitting. While good results on a variety of problem formulations 
have been reported for both methods, it is also true that both methods can 
produce inaccurate and misleading results. The design issue is critical, and 
one can argue that the proper theoretical tools for the design of importance 
sampling and splitting algorithms were simply not available for complicated 
models and problem formulations. [An alternative approach based on inter- 
acting particles has also been suggested as in [3]. However, we are unaware 
of any analysis of the performance of these schemes as the probability of 
interest becomes small.] 

Suppose that the probability of interest takes the form p = P {Z £ G} = 
/x(G), where G is a subset of some reasonably regular space (e.g., a Polish 
space S) and // a probability measure. In ordinary Monte Carlo one gener- 
ates a number of independent and identically distributed (iid) samples {Zi} 
from //, and then estimates p using the sample mean of Ir^.gG}. In the 
case of importance sampling, one uses an alternative sampling distribution 
v, generates iid samples {-^i} from u, and then estimates via the sample 
mean of [dfj,/di/] (Zj)lr^ ^^1. The Radon-Nikodim derivative [dfi/di'] {Zi) 

guarantees that the estimate is unbiased. The goal is to choose u so that 
the individual samples [dfi/dv] (Zj)lr^ ^^i cluster tightly around p, thereby 
reducing the variance. However, for complicated process models or events 
G the selection of a good measure v may not be simple. The papers [10, 11] 
show how certain standard heuristic methods based on ideas from large de- 
viations could produce very poor results. The difficulty is due to points in S 
with low probability under v for which d^./dv is very large. The aforemen- 
tioned large deviation heuristic does not properly account for the contribu- 
tion of these points to the variance of the estimate, and it is not hard to find 
examples where the corresponding importance sampling estimator is much 
worse than even ordinary Monte Carlo. The estimates exhibit very inaccu- 
rate and/or unstable behavior, though the instability may not be evident 
from numerical data until massive amounts have been generated. 

The most discussed application of splitting type schemes is to first en- 
trance probabilities, and to continue the discussion we specialize to that 
case. Thus Z is the sample path of a stationary stochastic process {Xi} 



(which for simphcity is taken to be Markovian), and G is the set of trajec- 
tories that first enter a set B prior to entering a set A. More precisely, for 
disjoint B and A and x ^ AU B, 

p = p{x) = P{Xj eB,Xi ^A,ie{0,...,j},j <cx)\Xo = x}. 

In the most simple version of splitting, the state space is partitioned accord- 
ing to certain sets B C Cq C Ci C ■ ■ ■ C Ck, with x ^ Ck and A n Ck = 0- 
These sets are often defined as level sets of a particular function V, which 
is commonly called an importance function. Particles are generated and 
killed off according to the following rules. A single particle is started at 
X. Generation of particles (splitting) occurs whenever an existing particle 
reaches a threshold or level Ci for the first time. At that time, a (possibly 
random) number of new particles are placed at the location of entrance into 
Ci. The future evolutions of these particles are independent of each other 
(and all other particles), and follow the law of {-'^i}. Particles are killed if 
they enter A before B. Attached to each particle is a weight. Whenever a 
particle splits the weight of each descendent equals that of the parent times 
a discount factor. A random tree is thereby produced, with each leaf corre- 
sponding to a particle that has either reached B or been killed. A random 
variable (roughly analogous to a single sample [dn/dv] {Zi)l(2.^Q\ from the 
importance sampling approach) is defined as the sum of the weights for all 
particles that make it to B. The rule that updates the weights when a par- 
ticle splits is chosen so that the expected value of this random variable is 
p. This numerical experiment is independently repeated a number of times, 
and the sample mean is again used to estimate p. 

There are two potential sources of poor behavior in the splitting algo- 
rithm. The first and most troubling is that the number of particles may be 
large. For example, the number could be comparable 6 for some 6 > 1. In 
settings where a large deviation characterization of p is available, the num- 
ber of levels itself usually grows with the large deviation parameter, and so 
the number of particles could grow exponentially. We will refer to this as 
instability of the algorithm. For obvious computational reasons, instability 
is something to be avoided. The other source of poor behavior is analogous 
to that of importance sampling (and ordinary Monte Carlo), which is high 
relative variance of the estimate. If the weighting rule leads to high varia- 
tion of the weights of particles that make it to S, or if too many simulations 
produce no particles that make it to B (in which case a zero is averaged in 
the sample mean), then high relative variance is likely. Note, however, that 
this problem has a bounded potential for mischief, since the weights cannot 



be larger than one. Such a bound does not hold for the Radon-Nikodim 
derivative of importance sampling. 

When the probability of interest can be approximated via large devia- 
tions, the rate of decay is described in terms of a variational problem, such 
as a calculus of variations or optimal control problem. It is well known 
that problems of this sort are closely related to a family of nonlinear par- 
tial differential equations (PDE) known as Hamilton-Jacobi-Bellman (HJB) 
equations. In a pair of recent papers [4, 6], it was shown how subsolutions 
of the HJB equations associated with a variety of rare event problems could 
be used to construct and rigorously analyze efficient importance sampling 
schemes. In fact, the subsolution property turns out to be in some sense 
necessary and sufficient, in that efficient schemes can be shown to imply the 
existence of an associated subsolution. 

The purpose of the present paper is to show that in certain circumstances 
a remarkably similar result holds for splitting algorithms. More precisely, 
we will show the following under relatively mild conditions. 

• A necessary and sufficient condition for the stability of the splitting 
scheme associated to a given importance function is that a certain 
scalar multiple of the importance function be a subsolution of the 
related HJB equation. The multiplier is the ratio of the logarithm of 
the expected number of offspring for each split and the gap between 
the levels. 

• If the subsolution property is satisfied, then the variance of the split- 
ting scheme decays exponentially with a rate defined in terms of the 
value of the subsolution at a certain point. 

• As in the case of importance sampling, when a subsolution has the 
maximum possible value at this point (which is the value of the cor- 
responding solution), the scheme is in some sense asymptotically op- 
timal. 

These results are significant for several reasons. The most obvious is that 
a splitting algorithm is probably not useful if it is not stable, and the subso- 
lution property provides a way of checking stability. A second is that good, 
suboptimal schemes can be constructed and compared via the subsolutions 
framework. A third reason is that for interesting classes of problems it is 
possible to construct subsolutions that correspond to asymptotically optimal 
algorithms (see [4, 6]). Subsolutions can be much easier to construct than 
solutions. In this context it is worth noting that the type of subsolution 



required for splitting (a viscosity subsolution [1, 7]) is less restrictive that 
the type of subsolution required for importance sampling. Further remarks 
on this point will be given in Section 5. 

An outline of the paper is as follows. In the next section we describe 
the probabilities to be approximated, state assumptions, and formulate the 
splitting algorithm. This section also presents a closely related algorithm 
that will be used in the analysis. Section 3 studies the stability problem, 
and Section 4 shows how to bound the variance of an estimator in terms 
of the related subsolution. The results of Sections 3 and 4 can be phrased 
directly in terms of the solution to the calculus of variations problem that 
is related to the large deviation asymptotics. However, for the purposes 
of practical construction of importance functions the characterization via 
subsolutions of a PDE is more useful. These issues are discussed in Section 
5, and examples and numerical examples are presented in the concluding 
Section 6. 

Acknowledgment. Our interest in the parallels between importance sam- 
pling and multi-level splitting was stimulated by a talk given by P.T. de 
Boer at the RESIM conference in Bamberg, Germany [2]. 

2 Problem Formulation 

2.1 Problem Setting and Large Deviation Properties 

A domain D C M is given and also a sequence of discrete time, stationary, 
Markov D— valued processes {A'"}. Disjoint sets A and B are given, and 
we set r" = min {i : A" E j4 U B}. The probability of interest is then 

p"(x„)=P{A?„Gi?|Ao" = x„}. 

The varying initial conditions are used for greater generality, but also be- 
cause initial conditions for the prelimit processes may be restricted to some 
subset of D. The analogous continuous time framework can also be used 
with analogous assumptions and results. For a given point x ^ AU B, we 
make the following large deviation-type assumption. 

Condition 1 For any sequence Xn — > x, 

lim \ogp"'{xn) = W{x), 



where W{x) is the solution to a control problem of the form 



inf / L{(j)(s),^(s)]ds. 



Here L : W^ x 'R'^ —f [0, oo], and the infimum is taken over all absolutely 
continuous functions (p with (j){0) = x, (f){t) € B,(f){s) ^ A for all s G [0,t] 
and some t < oo. 



Remark 2 The assumption that {X"} be Markovian is not necessary for 
the proofs to fohow. For example, it could be the case that X" is the first 
component of a Markov process (X",l^") (e.g., so-called Markov- modulated 
processes). In such a case it is enough that the analogous large deviation 
limit hold uniformly in all possible initial conditions Yq, and indeed the 
proofs given below will carry over with only notational changes. This can 
be further weakened, e.g., it is enough that the estimates hold uniformly 
with sufficiently high probability in the conditioning data. However, the 
construction of subsolutions will be more difficult, since the PDE discussed in 
Section 5 is no longer available in explicit form. See [6] for further discussion 
on this point. 

It is useful to say a few words on how one can verify conditions like Con- 
dition 1 from existing large deviation results. Similar but slightly different 
assumptions will be made in various places in the sequel, and in all cases 
analogous remarks will apply. 

For discrete time processes one often finds process-level large deviation 
properties phrased in terms of a continuous time interpolation X"(t), with 
X"'{i/n) = X" and X"'{t) defined by piecewise linear interpolation for t 
not of the form t = i/n. In precise terms, process-level large deviation 
asymptotics hold for {X"} if the following upper and lower bounds hold for 
each T e (0, oo) and any sequence of initial conditions Xn & D with Xn -^ x. 
Define 

/J(0) = y L(^(t>{s),4>{s))ds 

if (j) is absolutely continuous with (/)(0) = x, and Ix{(t>) = oo otherwise. If F 
is any closed subset of C([0, T] : D) then the upper bound 



limsup-logP{X'' GF|X"(0) =x„} < - inf /J 



\^) 



holds, and if O is any open subset of C([0,T] : D) then the lower bound 



liminf-logPjX" G O|X"(0) = x„} > - inf Z 

n— »oo n 0GO 



holds. It is also usual to assume that for each fixed x,T, and any M < oo, 
the set 

{0GC([O,r]:I?):/J((/>)<M} 

is compact in C([0, T] : D). The zero-cost trajectories (i.e., paths (f) for 
which I^{(j)) = 0) are particularly significant in that all other paths are in 
some sense exponentially unlikely. 

With regard to Condition 1, two different types of additional conditions 
beyond the sample path large deviation principle are required. One is a con- 
dition that allows a reduction to large deviation properties over a finite time 
interval. For example, suppose that there is T such that if (j) enters neither 
A nor B before T, then /J (<?!>) > W{x) + 1. In this case, the contribution to 
P^{xn) from sample paths that take longer than T is negligible, and can be 
ignored. This allows an application of the finite time large deviation prin- 
ciple. Now let G be the set of trajectories that enter B at some time t <T 
without having previously entered A. By the first condition, the asymptotic 
rates of decay of p^{xn) and P{X" G G |X"(0) = Xn} are the same. The 
second type of condition is to impose enough regularity on the sets A and B 
and the rate function l];{(j)) that the infimum over the interior and closure 
of G are the same. These points are discussed at length in the literature on 
large deviations [8]. 

Example 3 Assume the following conditions: L(-, •) is lower seniicontinu- 
ous; for each x G D,L{x,-) is convex; L{x,-) is uniformly superlinear; for 
each x (z D there is a unique point b{x) for which L{x, b{x)) = 0; b is Lips- 
chitz continuous, and all solutions to (p = b{(j)) are attracted to 9 ^ A, with A 
open. Let D C D be a bounded domain that contains A and B, and assume 
{b{x),n{x)) < for x G dV, where n{x) is the outward normal to V at x. 
Suppose that the cost to go from x to any point in dV is at least W{x) + 1. 
Then T as described above exists. 

2.2 The Splitting Algorithm 

In order to define a spitting algorithm we need to choose an importance 
function V{y) and a level size A > 0. We will require that V{y) be continu- 
ous and that V{y) <Q for all ?/ G -B. Later on we will relate V to the value 



function W, and discuss why subsolutions to the PDE that is satisfied by 
W are closely related to natural candidates for the importance function. 

To simplify the presentation, we consider only splitting mechanisms with 
an a priori bound i? < oo on the maximum number of offspring. The 
restriction is convenient for the analysis, and as we will see is without loss of 
generality. The set of (deterministic) splitting mechanisms will be indexed 
by j G {1, • • • , J}- Given that mechanism j has been selected, r(j) particles 
(with \r{j)\ < R) are generated and weights w{j) € M^ are assigned to 

the particles. Note that we do not assume Yll=i '^iU) — 1- The class of all 
splitting mechanisms (i.e., including randomized mechanisms) is identified 
with the set of all probability distributions on {1, . . . , J}. 
Associated with V are the level sets 

L, = {yeD: V{y) < z}. 

A key technical condition we use is the following. In the condition. Ex 
denotes expected value given Xq = x. 

Condition 4 Let z € [0, ^(x)] he given and define u" = min {i : X" G A U L^}. 
Then 



lim inf log Ex„ 



Hx:.eL.}iP''iX:^)f\>W{x)+ inf W^(2/). 



Under the conditions discussed after Condition 1 which allow one to con- 
sider bounded time intervals. Condition 4 follows from the Markov property 
and the large deviation upper bound. 

We also define collections of sets < Cq = B, C" = -^(j_i)A/n) J = 1; • • • f • 
Define the level function /" by /"(y) = min{j > : y € C^}- The location of 
the starting point corresponds to P(x) = [ny(x)/A], and P = indicates 
entry into the target set B. The splitting algorithm associated with a partic- 
ular distribution q will now be defined. Although the algorithm depends on 
V,q,r,w,Xn,^,A and B, to minimize notational clutter these dependencies 
are not explicitly denoted. 

Splitting Algorithm (SA) 

Variables: 

A'^^ number of particles in generation r 

X^f^ position of k particle in generation r 

w^f^ weight of k particle in generation r 



Initialization Step: 

for r = l,...,r(3;„) 

A^; = 
end 
Main Algorithm: 

for r = l,...,r(3;„) 

if N^_^ = then N;} = 
else 

for j = l,...,Af;_i 

generate Z^^^ a single sample of a process with the 
same law as X^ and initial condition Z" • q = X"_]^ ■ 

let r-^.=mf{i:Z«^.,GAuC-(^.^)_J 

Splitting Step begin 

if Z;? . ^n ^A 

let M be an independent sample from the law q 

for k = l,...,|r(M)| 

end 
end 
Splitting Step end 

end 

end 
end 
Construction of a sample: 

once all the generations have been calculated we form 

the quantity 

A'"" 



It should be noted that generation 1 consists of all of the particles that 
make it to set C^n(^\_i and more generally generation j consists of all par- 
ticles that make it to set Cln,^_-. We also define generation to be the 



initial particle. 

An estimate p^js^{xn) of p"'(x„) is formed by averaging a number of inde- 
pendent samples of Sg^. Observe that once generation r has been calculated 
the information about generations to r — 1 can be discarded, and so there 
is no need to keep all the data in memory until completion of the algorithm. 
Also note that in practice there is no need to split upon entering Cq = B. 




Figure 1: The Sets A and B and Level Sets of V . 



We first need to find conditions under which this splitting algorithm gives 
an unbiased estimator of p"(x„). To simplify this and other calculations we 
introduce an auxiliary algorithm titled Splitting Algorithm Fully Branching 
(SFB). The essential difference between the two is that the process dynamics 
are redefined in A to make it absorbing, and that splitting continues even 
after a particle enters A. When the estimate is constructed we only count the 
particles which are in B in the last generation, so that the two estimates have 
the same distribution. The SFB algorithm is more convenient for purposes 
of analysis, because we do not distinguish those particles which have entered 
A from those which still have a chance to enter B. Of course this algorithm 
would be terrible from a practical perspective-the total number of particles 
is certain to grow exponentially. However, the algorithm is used only for 



10 



the purposes of theoretical analysis, and the number of particles is not a 
concern. Overbars are used to distinguish this algorithm from the previous 
one. 

Splitting Algorithm Fully Branching (SFB) 

Variables: 

iV" number of particles in generation r 

X'^f^ position of k particle in generation r 

w)"^ weight of k particle in generation r 

Initialization Step: 

N^ = l, Xli=Xn, 1^^,1 = 1 

for _r = l,...,r(x„) 

iv;? = 

end 
Main Algorithm: 

for r = l,...,r(x„) 

for j = i,...,iy;Li 

generate Z'!^-^ a single sample of a process with the 
same law as Xf and initial condition Z"g = X^_^ • 

let r«^. = min{i : Z« .^, G A U C,'i(,^^)_ J 

Splitting Step begin 

let M be an independent sample from the law q 

for k = l,...,|r(Af)| 

yn yn 

end 

Splitting Step end 

end 

end 
Construction of a sample: 

once all the generations have been calculated we form 
the quantity 

„n _ V^ i"(a:n) 1 -n 
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Since the distributions of the two estimates coincide 



-Err 



■n: 



m -1 




1. <(-.),. 


- Ex„ 


j=i 





N, 



E 1 






Because of this, the SFB algorithm can be used to prove the fohowing. 

Lemma 5 An estimator based on independent copies of Sg^ is unbiased if 
and only if 

rr(M) 

E Y.w,{M) =1. 



Proof. It suffices to prove 



E^, 



n: 



l"(xn) 



E 






p'^ix. 



We wiU use a particular construction of the SFB algorithm that is useful 
here and elsewhere in the paper. Recall that with this algorithm every 
particle splits at every generation. Hence the random number of particles 
associated with each splitting can be generated prior to the generation of 
any trajectories that will determine particle locations. As a consequence, the 
total number of particles present at the last generation can be calculated, as 
can the weight that will be assigned to each particle in this final generation, 
prior to the assignment of a trajectory to the particle. Once the weights 
have been assigned, the trajectories of all the particles can be constructed 
in terms of random variables that are independent of those used to construct 
the weights. Since the probability that any such trajectory makes it to B 
prior to hitting A is p'^{xn), 



Ex 



V" 

E 



1. l"{xn),J j 



p"'iXn)Ex 



V" 

E 



w. 



l"-{Xn),j 



A simple proof by induction and the independence of the splitting from 
particle to particle shows that 



Ex 





/ 


'riM) 


2^ ^r"(^n)j 


M^ 


^ miM) 


J = l 


V 


i=l 



r(x„) 
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For the rest of the paper we restrict attention to spHtting mechanisms 
that are unbiased. 

3 Stability 

Now let an importance function V, level A, and splitting mechanism (q, r, w) 
be given. Define 

Jix,y) = inf f L U{s),Hs)) ds, (1) 

where the infimum is over absolutely continuous functions. A function W : 
Z? — > M will be called a subsolution if W{x) < for all a; G -B and if 
W{x) - W{y) < J{x,y) for all x,y e D\{AuB). In Section 5 we wih 
discuss conditions under which W can be identified as a viscosity subsolution 
for an associated PDE. Recall that a splitting algorithm is called stable if 
the total number of particles ever used grows subexponentially as n ^ oo. 
For a given splitting algorithm define 

W(.)='^i^Vi.). ,2) 

In this section we show that, loosely speaking, a splitting algorithm is stable 

if and only if 1^ is a subsolution. 

A construction that will simplify some of the proofs is to replace a 

given splitting mechanism by one for which all the weights are constant. 

Thus {q,r,w) is replaced by {q,r,w), where for each j = 1,...,J and 

i = '^,---,riJ), 

J 

{m{j)]-' = Er{M)=Y,r{j)qr 

j=i 

The new splitting mechanism is also unbiased, and the distribution of the 
number of particles at each stage is the same as that of {q,r,w). 

To establish the instability we make a very mild assumption on a large 
deviation lower bound for the probability that an open ball is hit prior to 
reaching A. This assumption can be expected to hold under conditions 
which guarantee Condition 1. 
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Proposition 6 Consider an importance function V , level A, and split- 
ting mechanism {q,r,w), and define W by (2). Suppose there exists y € 
D\ {A U B) such that W{y) > and 

W{x)-W{y)>J{x,y). (3) 

Assume that J{x,y) is continuous at y. Let p"{xn) be the probability that 
X" enters the ball of radius 6 > about y before entering A, given Xq = Xn, 
and assume 

liminf — logp"(2;„) > — inf J'{x,z). 

n^oo n z:\z-y\<S 

Then the corresponding splitting algorithm is not stable. 

Proof. It is enough to prove the instability of the algorithm that uses 
{q,r,w). Since J'{x,y) > 0, V{y) < V{x). From the definition of W in (2) 
and (3) there exist 5 > and e > such that for all z with |y — z| < S, 

[V(x)-V(.)l!2i^>j(.,.)+.. 

Let S = {z : \y — z\ < 5}. By taking 5 > smaller if necessary we can 
guarantee that SnA = ^ and V{z) > for all z e S. 



Suppose one were to consider the problem of estimating p^{xn)- One 
could use the same splitting mechanism and level sets, and even the same 
random variables, except that one would stop on hitting ^ or 5 rather 
than A ot: B, and the last stage would correspond to some number m" < 
V^{xn)- Of course, since V is positive on S it will no longer work as an 
importance function, but there is a function V = V — a that will induce 
exactly the same level sets as V and which can serve as an importance 
function for this problem. See Figure 2. The two problems can be coupled, 
in that exactly the same random variables can be used to construct both 
the splitting mechanisms and particle trajectories up until the particles in 
the p^{xn) problem enter C". 

If a particular particle has not been trapped in A prior to entering Cf , 
then that particle would also not yet be trapped in A in the corresponding 
scheme used to estimate p"'{xn)- Note also that the number of particles that 
make it to 5 = Cq are at most R times the number that make it to Cf. 
Let N^n denote the number of particles that make it to S in the SA used 
to approximate p^{xn), and let NJl, ■, be the number used in the SA that 

approximates p'^{xn). Then A'^j",^ -s > N^n/R. 
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Figure 2: Level Sets of V in Proof of Instability. 

Using the SFB variant in the same way that it was used in the proof of 
Lemma 5 and that the mechanism {q, r, id) is used, 



p'^iXn) 



Ex 



En- 



N" 



E 



■w. 



m" J 



J2 [EriM]]--" 



[Er{M)r'^"Ex 
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We now use the lower bound onp"(2;„) and that mP" jn -^ [V{x) — sup^g^ ^(-2)] f^'- 



1 



AT" 



hm inf — log E^^ 

n— >oo n 

= liminf-log^,„ p''{xn)[Er{M)Y 



n— >oo n 



> - inf J{x, z) + [^^"^ - 7-^^ ^^"^J log [Er{M)] 



> inf 

zes 

> e. 
It follows that 



[y{x) - v{z)] 

A 



log[^r(M)] -J{x,z) 



lim inf — log E^^ 

n— >oo fi 






{Xn) 



>e>0, 



which completes the proof. 



The next proposition considers stability. Here we will make a mild as- 
sumption concerning a large deviation upper bound, which can also be ex- 
pected to hold under conditions which guarantee Condition 1. 

Proposition 7 Consider an importance function V , level A, and splitting 
mechanism {q,r,w), and define W by (2). Suppose that 

W{x)-W{y)<J{x,y) 

for all x,y (^ D\{AL)B) and that W{y) < for all y & B. Consider any 
a G [0, y(x)], letp^{xn) he the probability that X^ enters level set La before 
entering A (given Xq = Xn), and assume 

limsup — logjJ"(xn) < — inf J'{x,z). 

n — ^00 ri z(^La 

Then the corresponding splitting algorithm is stable. 

Proof. For each n let r" be the value in {!,..., /"(x)} that maximizes r -^ 
Ex [N^]. Since r^/n is bounded, along some subsequence (again denoted 
by n) we have r"/n ^ f G [0, y(x)/A]. Using the usual argument by 
contradiction, it is enough to prove 



hmsup - log Ex„ [N^n] < 
n-^00 ri 
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along this subsequence. First suppose that v = 0. Given 5 > 0, choose 
n < oo such that r'^/n < 5 for all n > n. Then N]^A^" < R , and so 
lim sup,„_,j^ ^logEx^ [N^rl^''] < S ■ logR. Since J > is arbitrary, this case 
is complete. 




Figure 3: Level Sets of V in Proof of Stability. 



Now assume v € {0,V{x)/A] and let 5 € (0,f) be given. Suppose one 
were to consider the problem of estimating p"(x„) as defined in the statement 
of the proposition, with a = V{x) — A{v — 6). We again use the same 
splitting mechanism and level sets, except that we now stop on hitting A or 
^V{x)-A{v-5)- All importance function with these level sets can be found by 
adding a constant to V. We again couple the processes. Observe that entry 
into Cf for the p^{xn) problem corresponds to entry into d^n in the p'^{xn) 
problem for some m" such that mP" jn -^ [V{x)/A] — (v — 6) as n — > oo. 
Note that particles generated upon entry into the set C^n will correspond 
to generation l"'{x) — m^ of the algorithm. The final generation of the 
algorithm to estimate p^{xn) is generation /"(x) — rn^ + 1, corresponding 



to the particles generated upon reaching Cq = L 



since C;;,„+i C Ly(^) 



'V{x)-/^{v-5)- 



Observe that 



Mv-5) every particle in the algorithm used to estimate 
p'^{xn) that is not trapped in A by stage l'^{x) — m" + 1 will make it to the 
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target set -^y(x)--A(i)-5) iii the algorithm used to estimate p"(a;„). Hence 
iV;"„(^)_^„+i > iV;n(x)-m"+i' where iV,'^(^)_„n+i denotes the number of such 
particles for the SA used to estimate p"(a;„). 

We again use the SFB variant in the same way that it was used in the 
proof of Lemma 5 and the {q, r, w) splitting mechanism to obtain 



p^ixn) = [EriM)\ 



-{r(x)-m"+l) 



Et 






It follows from?7i"/n —>■ [V{x)/A] — {v — 6) that (P(x) — m" + l)/?i — > [v — S]. 
Using the upper bound on p^{xn)i 



hm sup - log E^^ iV;n(^)_^n+l 

1 



lim sup — log E. 

n— >oo '^ 



< 



< 



zeL 



r{xn) [^r(M)]'"(^)-'""+^ 
inf J{x, z) + [v- 5] log [Er{M)] 



V(x)-/\(v-6) 



sup 

^'=-^V-(i)-A(u-«) 



[V{x) - V{z)] 



\og[Er{M)]-J{x,z) 



< 0. 



For sufficiently large n we have r^ — {l^{x) — mP + 1) < 25n/ IS., and hence 






^25n/A^ It follows that 



1 



limsup-log£;,„ [7V;„] < (25/A) • logi?, 

n— >oo 'T' 

and since (5 > is arbitrary the proof is complete. ■ 



4 Asymptotic Performance 

Since the sample Sg^ has mean p"'(x„), any estimator constructed as an 
average of independent copies of Sg^ is unbiased and has variance propor- 
tional to var^:^ [•^sa]- Oiice the mean is fixed, the minimization of var^^^ [^saI 
among splitting algorithms is equivalent to the minimization of Ex^ {^sa\ ■ 
It is of course very difficult to find the minimizer in this problem. When a 
large deviation scaling holds, a useful alternative is to maximize the rate of 
decay of the second moment, i.e., to maximize 



1 2 1 

lim inf log E^^ [sgj^] = lim inf log E^^ 

n^oa n n—i-oo n 



'N, 
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By Jensen's inequality the best possible rate is 2W{x): 



liminf-ilog£;,„ [s^j,f > liminf--log^,„ [s^j,] > 2W{x). 

ri— >oo n n— too n 

The main result of this section is the following. 

Theorem 8 Consider an importance function V , level A, and splitting 
mechanism {q,r,w), and define W by (2). Suppose that 

W{x)-W{y)<J{x,y) 

for all X, y G D\ {A U B) and that W{y) < for all y G B. Assume also 
that Conditions 1 and 4 hold. Then 



hm --logi?,„ [s^^f = W{x) - V{x)- 

n^oo n 



r{M) , 



A 



Proof. It is sufficient to consider the SFB algorithm and prove that 

2 



lim log Ex 

rt— too n 



N, 



E 1 






W 



l''{Xu),j 



W{x)-V{x)- 



log [EElL'PmiM)' 



The proof is broken into upper and lower bounds. 
We first prove 



lim sup log Ex 

n-^oo n 



N 






< W{x)-V{x)- 






A 



(4) 



In the following display we drop cross terms to obtain the inequality, and 
then use the same construction as in Lemma 5 under which the weights and 
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trajectories are independent to obtain the equality. 



limsup logEx„ 

n—*oo f^ 



E 1 



; {*?.,.,...««}"''"''"''' 



< limsup log£'x„ 

n— ►oo ITi 



N, 



l"(xn) 






limsup log I p^{xn)Ex^ 

n— >oo JT- 



Af. 



!"(a:n) 






Suppose we prove that for any k (and in particular k = /'"(x^)), that 

"TV" 



E,r 



E (-". 



j=i 



r(Af) 



(5) 



Since l'^{xn) = \nV{xn)/^~\, (4) will follow from Condition 1. The proof of 
(5) is by induction. Let Mj denote the independent random variables used 
to define the splitting for the jth particle at stage k. Then 



-Er 






Err. 



N^ r{Mj) 



i=l 



£". 



r(M) 

E Y, MMf 

We now turn to the proof of the lower bound 



r(Af) 

E Y, Wi{M)' 

K + 1 



lim inf log Ex„ 

n— >oo 77, 



> W{x)-V{x)- 



N" 

\og {EY:ifnj,[Mf 

A 



(6) 
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For each stage k, let MJ} ■ denote the independent random variables used in 
the splitting of particle j € {l, . . . , -/V"}. Also, let I" ■ denote the disjoint 
decomposition of the particles in {l, . . . ,-/V"_^]^} according to their parent 
particle. Observe that if k,l G I^j-,k ^ /, then for all particles descended 
from k and /, k is the time of their last common ancestor. Given k £ I" •, 
let I^_^_i inu ) k denote the descendants of this particle at stage l'^{xn)- With 
this notation we can write 



Err, 



J = l 

E 

re=0 



{^r^i..,.^B}''Hx«i^ 



E,, 



TV" 



E E E 



1_ l"{xn),mf, j 



1, 






+E.„ 



N, 



E 1 



r {-«r»<„,j'^^} r'"(-.)j 



Let w" • denote the products of the weights accumulated by particle 
j € {l, . . . , TV"} up to stage k, and let w^.i i^/^ \ ^ denote the product of 

the weights accumulated by particle m G si,... ,^i^(x ) \ between stages 
K + 1 and the final stage. Finally, let TJ} denote the sigma algebra generated 
by M"-, s G {1, . . . , k} , j G {l, . . . , iV"} and the random variables used 
to construct X" • for these same indices. Note that the future weights are 
independent of .7^", and that the distribution of X." , n ^ depends on JT" 

only through X" ■ if A; G /",• and m G I" , ,„/, n ■.. We introduce the 
notation 



yn 






lr.-.„ „Ttt)" 



""=-'«;+l,i"{a:„),fe 
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By conditioning on J^J} we get 



Er. 



N" 









(7) 



H. + l,l"(xn),k 



E 

'*= «; + l,i"(a:„),i 



I. l"{xn),n-ii J 



-E.T 



-£':r 



7V!^ 



j^y,':,. 5^ wkiM:^^)zi,wiiMi^)zii 






E Kj E wk{MiM^Mi^)Ex:^, [zi,] £;^„ . [z:, 



j=i k,ieii^,^,k^i 



Using again the independence of the weights and trajectories as used in the 
proof of Lemma 5, we have 



E^.[zi,]=p-{x::^^). 



Since 



W = E^WkiM)wi{M) =E 
the final expression in (7) equals 

WE, 



Y,MM) 



E 



Y^MMf 






We conclude that 



E-r- 



E 1 



-|^ I l"{xn),3 J 



iXn),j 



l"{x„)-l 

K=0 



m 






+-Ex 



E 1 



1 {^rn(.„).eB} r/"(^.). 



22 



For a final time we use that the weights and trajectories are independent, 
and also (5), to argue that for any bounded and measurable function F and 
any stage k, 



Et. 



m 






E.^[F{Xl,)]E,,^ 



E 



'r{M) 



i=l 






E.^ [F {XI,)] 



Thus 



Err. 



N, 



E 1 






P(z„)-1 / r{AI) 

^ E Ue^*w| ^-" 

K=0 V i=l 



H^:,l^^}^" (^",l)^ 



r(M) 



i"(x„) 



S E ^*(^^)' ^-" 



j=l 



{^""(^.n),!^^}. 



Since /"(a;„) is proportional to n, to prove (6) it is enough to show that 
if Kn is any sequence such that K„/n -^ v & [0, V{x)/A], then 



lim inf log E^^ 

n— >oo n 



r{M) 

E ^ w,{Mf I £;,„ 



> W{x)-V{x)- 
Observe that {XJ^^^^A} implies X^„,i G Cp, 






[nV{x)/A]-Kn 



By Condition 4, 



hminf — log^,„ [l|^„^^^^|p" (^:„,i)' 



n— >oo 71 



> W{x) + inf W^(y). 

»/G9iv(a;)-uA 



By the subsolution property, W{y) > V (y) log Er{M)/ A. By Holder's 
inequality 

r{M) r(M) 

E E Wi{Mf ■ Er{M) > ^ E "^^ W = 1' 
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and therefore 



r{M) 



log LE ^ Wi{Mf < log {Er{M)). 



i=l 



Thus 



hm inf log Ex 

n— >oo n 



r{M) 

E Y, MM? I E^ 

i=l 



if vn 



{X]l^^,iA}P l^«„,l 



XI 



r{M) 



> -wlog Le y Wi{Mf + W{x) + inf 



log Er{M) 



i=l 

r{M) 



y&dLY(x)-vA 



'log Le ^ Wi{Mf + W{x) + log ^r(M) 



j=i 



A 



Vix) 

A 



V{y) 



> W{x)-V{x)- 



r{M) 



log{EZ?:"'MM) 



A 



and the proof is complete. 



4.1 Design of a Splitting Algorithm 

Suppose that V{x) and A are given and that we choose a splitting mechanism 
{q, r, w) which is unbiased and stable. By Theorem 8 the asymptotic rate of 
decay of the second moment is given by 



W{x)-V{x)- 



r{M) ^ 



iog[EZZrw,iM) 

A 



Recall from the proof of the last theorem that 

/ riM) \ 

-log IeY^ Wi{Mf < log {Er{M)) . 

Further equality holds if and only if Wi{m) = ^Jj^,j\ for alH € {!,..., r{m)} 
and all m € {1, . . . , J}. Given the value u = Er{M), an alternative splitting 
mechanism which is arguably the simplest which preserves the value and 
achieves the equality in Holder's inequality is that defined by J = 2 and 

qi = \u] -u,q2 = l- Qi, r(l) = \u] , r(2) = [nj , Wi{j) = 1/u all i,j. (8) 
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Given a subsolution W, the design problem and the performance of the 
resulting algorithm can be summarized as follows. 

• Choose a level A and mean number of particles u, and define an im- 
portance function V by logu ■ V{x)/A = W{x). Define the splitting 
mechanism by (8). The resulting splitting algorithm will be stable. 

• If Sg^ is a single sample constructed according to this algorithm, then 
we have the asymptotic performance 

hm -ilogi?,J(sgA)'] = W{x) + W{x). 

n— >oo 77, 



• 



The largest possible subsolution satisfies W{x) = W{x), in which case 
we achieve asymptotically optimal performance. 



Remark 9 Although the subsolution property guarantees stability, it could 
allow for polynomial growth of the number of particles. If in practice one 
observes that a large number of particles make it to B in the course of 
simulating a single sample s^^, then one can consider reducing the value of 
A slightly, while keeping the mechanism and V fixed. In PDE parlance, this 
corresponds to the use of what is called a strict subsolution. Because the 
value of W{x) is lowered slightly, there will be a slight increase in the second 
moment of the estimator. However, the strict inequality provides stronger 
control, and indeed the expected number of particles and moments of the 
number of particles will be bounded uniformly in 77,. 

5 The Associated Hamilton- Jacobi-Bellman Equa- 
tion 

The probability p'^{x) is intimately and naturally related, via the exponential 
rate W{x), with a certain nonlinear PDE. This relation is well known, and 
follows from the fact that W is characterized in terms of an optimal control 
or calculus of variations problem. We begin this section by defining the PDE 
and the notion of a subsolution in the PDE context. 

Our interest in this characterization is because it is more convenient for 
the explicit construction of subsolutions than the one based on the calculus 
of variations problem. See, for example, the subsolutions constructed for 
various large deviation problems in [6] and [4]. (It should be noted that the 
constructions in these papers ultimately produce classical subsolutions. In 
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contrast, the splitting algorithms require only the weaker viscosity subso- 
lution property. However, the smoother subsolutions constructed in [4, 6] 
are obtained as mollified versions of viscosity subsolutions, and it is the 
construction of these unmollified functions that is relevant to the present 
paper.) Other examples will be given in the next section. Since our only 
interest in the PDE is as a tool for explicit constructions, we describe the 
characterization formally and in the simplest possible setting, and refer the 
reader to [1, 7]. 
For qeW^, let 

M{x,q)= mf\{q,P)+L{x,p)]. 



Then under regularity conditions on L and the sets A and B, W can be 
characterized as the maximal viscosity subsolution to 

mix,DWix)) = 0,x^AuB, W{x) = !^^ lll^ . 

A continuous function T^ is a viscosity subsolution to this equation and 
boundary conditions if W{x) < for x E dB, W{x) < cxd for x G dA and 
if the following condition holds. If </> : M — > M is a smooth test function 
such that the mapping x — > [l^(x) — (/'(x)] attains a maximum at xq G 
R'^\(AUB), then m{xo,D(j){xo)) > 0. If W is smooth at xq then this 
implies M{xo,DW{xo)) > 0. 

Note that EI(x, •) is concave for each xq S W^, and hence the pointwise 
minimum of a collection of subsolutions is again a subsolution. It is this 
observation which makes the explicit construction of subsolutions feasible in 
a number of interesting problems (see [6]). 

In Section 2 we defined W{x) to be a subsolution to the calculus of 
variations problem if it satisfied the boundary inequalities and 

Wix)-Wiy)<Jix,y) 

for all x,y G M'^\(Aui?). We now give the elementary proof that these 
notions coincide. Let W{x) — <j){x) attain a maximum at xq. Thus for any 
P e R'^ and ah a G (0, 1) sufficiently small, W{xo + a(3) - 0(xo + a/3) < 
W{xq) — ^(xo)) and so 

(I>{xq) - 0(xo + a/3) < W{xq) - W{xQ + a/3) 
< J(xo,xo + a/3). 
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Since J'{x,y) is defined as an infimum over all trajectories that connect x 
to y, we always have 



J{xo,xo + aj3) < / L{xq + sl3,f3)ds. 
Jo 

Hence if, e.g., the mapping x — > L{x,j3) is continuous, then for all /? 

(f>{xo) - (t>{xo + a/5) < L{xo,P)a + o{a). 

Using Taylor's Theorem to expand (j), sending a J, and then infimizing over 
P gives 

< inf^ [{D(l){xo),(3) + L(xo, /3)] < M{x, Dct){xo)). 



Thus 1^ is a subsolution. 

The calculation just given does not show that W [the solution to the 
calculus of variations problem] is the maximal viscosity subsolution. Also, 
we have not shown that subsolutions to the PDE are also subsolutions in 
the calculus of variations sense. 

The characterization of W as the maximal viscosity subsolution requires 
that we establish W{x) < W{x) whenever T^ is a viscosity subsolution. A 
standard approach to this would be to show that given a viscosity subsolu- 
tion W, any point x ^ AU B, and any e > 0, there exists a smooth classical 
subsolution W^ such that W^{x) > W{x) — e. When this is true the classi- 
cal verification argument [7] can be used to show W'^{x) — W'^{y) < J{x^ y), 
and since e > is arbitrary T^ is a subsolution to the calculus of variations 
problem. This implies W{x) > W{x). 

Invoking smooth subsolutions brings us very close to the method of con- 
structing nearly optimal importance sampling schemes as described in [4, 6], 
where the design of the scheme must be based on the smooth classical subso- 
lution W^{x) rather than W{x). In all the examples of the next subsection 
the inequality W{x) < W{x) can be established by constructing a nearby 
smooth subsolution as in [4, 6]. 

6 Numerical Examples 

In this section we present some numerical results. We study four prob- 
lems: buffer overflow for a tandem Jackson network with one shared buffer, 
simultaneous buffer overflow for a tandem Jackson network with separate 
buffers for each queue, some buffer overflow problems for a simple Markov 
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modulated queue and estimation of the sample mean of a sequence of i.i.d. 
random variables. 

For each case, we present an estimate based on a stated number of runs, 
standard errors and (formal) confidence intervals based on an empirical es- 
timate of the variance, and total computational time (intended only for 
comparing cases). We also present the maximum number of particles, which 
is the maximum over all runs of niaXr<in.(xn) ^r, as well as the empirical 
mean and standard deviation of max^<p(^^) Nr- 

Subsolutions, even among those with the maximal value at a given point, 
are not unique, and indeed for the problems to be discussed there are some- 
times a number of reasonable choices one could make. We will not give 
any details of the proof of the subsolution property, but simply note that in 
each case it can be proved by a direct verification argument as discussed in 
Section 5. 

6.1 Tandem Jackson Network - Single Shared Buffer 

Consider a stable tandem Jackson network with service rates A < min{^^, /i2}. 
Suppose that the two queues share a single buffer and that we are interested 
in the probability 

p"' = -P{o,o) {total population reaches n before first return to (0,0) } 

It is well known that 

lim logp" = nim{pi, P2} 

where pj = log^. Further, the (continuous time) Hamiltonian that corre- 
sponds to subsolutions of the relevant calculus of variations problem is 

m{p) = -[A(e-P^ - 1) + /ii(e(Pi-P2) - 1) + P2(e^' " !)]• 

(see [4] for the discrete time analogue) . Without loss of generality (see [4] ) 
one can assume that //2 < Mi- ^Y inspection M{p) = for p = — log ^{l, 1) 
(this root is suggested by the form of the escape region), and W{x) = 
{p, x) + log ^ is a subsolution which is in fact the solution and so leads 
to an asymptotically optimal splitting scheme. The table below shows the 
results of a splitting simulation with 20,000 runs for X = 1, p.^ = ^2 — 4. 5 
and for various values of n. 
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n 


30 


40 


50 


Theoretical Value 


2.63 X 10-^" 


1.03 X lO"^-' 


3.80 X lO"''^ 


Estimate 


2.67 X 10^^" 


1.06 X lO^-^"' 


3.71 X 10^''^ 


Std. Err. 


0.11 X 10"^" 


0.05 X 10"^* 


0.20 X 10"''^ 


95% C.I. 


[2.45,2.88] X lO"-'" 


[0.97, 1.16] X lO"^* 


[3.32,4.10 X 10"''^ 


Time Taken (s) 


21 


52 


95 


Average no. particles 


23 


31 


37 


S.D. no. particles 


103 


155 


220 


Max no. particles 


2877 


4803 


8369 



Table 1. A = 1, /i^ = /i2 = 4.5, asymptotically optimal scheme. 

It was noted in Remark 9 that the number of particles generated may 
grow subexponentially in n and this appears to be reflected in the data. 
Following the suggestion of the remark, we also considered a slightly subop- 
timal but strict subsolution in the hopes of better controlling the number of 
particles with little loss in performance. The table below shows the results 
of numerical simulation for the same problem with a splitting algorithm 
based on the subsolution 0.93 x W{x). Again each estimate is obtained us- 
ing 20,000 runs. The results are in accord with our expectations. 



n 


30 


40 


50 


Theoretical Value 


2.63 X 10^^" 


1.03 X 10^^* 


3.80 X 10^'*^ 


Estimate 


2.72 X 10"^" 


1.08 X 10"^* 


3.67 X 10"'*^ 


Std. Err. 


0.15 X 10"^" 


0.08 X lO"^-' 


0.32 X lO"''^ 


95% C.I. 


[2.43,3.02] X 10"-"* 


[0.93, 1.23 X 10^^" 


[3.04, 4.30] X lO^''^ 


Time Taken (s) 


5 


9 


11 


Average no. particles 


8 


8 


8 


S.D. no. particles 


26 


28 


28 


Max no. particles 


628 


794 


632 



Table 2. X = 1, fii = fi2 — 4-5, asymptotically suboptimal scheme. 

6.2 Tandem Jackson Network - Separate Buffers 

In the paper [9] the authors address the problem of asymptotic optimality 
for splitting algorithms. In particular they consider an approach to choos- 
ing level sets that are claimed to be "consistent" with the large deviations 
analysis and show that this does not always lead to asymptotically opti- 
mal algorithms. They illustrate their results by considering, for a tandem 
Jackson network, the problem of simulating the probabilities 

p"' = -P(o,o) {both queues simultaneously exceed n before first return to (0,0)} 

It is shown that 



lim 

n— >oo 



1 



n 



logp" = p^ -h p2 = 7, 



and the authors propose a splitting algorithm based on the importance func- 
tion U{x) = 7 — 7min{3;i,a;2}, which is just a rescaling of the target set 
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B = {{x,y) : X > n or y > n}. They show that although the level sets given 
by this function may intuitively seem to agree with the most likely path to 
the rare set identified by the large deviations analysis, the resulting splitting 
algorithm in fact has very poor performance. By analyzing this importance 
function using the subsolution approach it is very easy to see why this is 
the case. The Hamiltonian corresponding to subsolutions is the same as in 
the previous section and it is clear to see that U{x) is not a subsolution. 
However, the function 



7 - Pi^i - P2X2 



W{x) 

is a subsolution. Further 1^(0) = 7, thus the corresponding importance 
function will lead to an asymptotically optimal splitting algorithm. Nu- 
merical results are presented for the cases A = 1,//]^ = 3, ^2 = 2 and 
A = 1,//^ = 2,/i2 = 3 which are the same rates originally considered in 
[9]. Each estimate was obtained by a simulation using 20,000 runs. 



n 


10 


20 


30 


Theoretical Value 


9.64 X 10"" 


1.60 X 10"-'=' 


2.64 X lO"^'' 


Estimate 


9.74 X 10"" 


1.58 X 10"-'=' 


2.66 X 10"'^'' 


Std. Err. 


0.17 X 10"° 


0.03 X 10"-''' 


0.06 X 10"^" 


95% C.I. 


[9.41, 10.1] X 10"" 


[1.52, 1.65] X 10"''' 


[2.54,2.79] X lO"^'' 


Time Taken (s) 


25 


188 


639 


Average no. partieles 


25 


53 


81 


S.D. no. particles 


47 


123 


213 


Max no. particles 


550 


1690 


3130 



Table 3. X = 1, fii = 3,^2 = 2, asymptotically optimal scheme. 



n 


10 


20 


30 


Theoretical Value 


9.64 X 10"" 


1.60 X lO"-'" 


2.64 X lO"^'' 


Estimate 


9.50 X 10"" 


1.56 X lO"-'" 


2.68 X 10""^-^ 


Std. Err. 


0.26 X 10"" 


0.06 X 10"-'" 


0.13 X lO"^'' 


95% C.I. 


[8.99, 10.0] X 10"" 


[1.44, 1.68] X 10""" 


[2.43,2.94] X lO"^'' 


Time Taken (s) 


19 


136 


468 


Average no. particles 


26 


54 


86 


S.D. no. particles 


74 


222 


448 


Max no. particles 


1055 


4905 


11350 



Table 4. X = 1, fii = 2, ^2 = 3, asymptotically optimal scheme. 

As expected, these results show a vast improvement over those obtained 
in [9]. Finally the tables below show the results of numerical simulation 
for the same problem with a splitting algorithm based on the subsolution 
0.95 X W{x). 



n 


10 


20 


30 


Theoretical Value 


9.64 X 10"" 


1.60 X 10"'" 


2.64 X lO"^'' 


Estimate 


9.51 X 10"" 


1.59 X 10"'-'' 


2.78 X lO"^'' 


Std. Err. 


0.20 X 10"" 


0.05 X 10"'" 


0.14 X lO"^"* 


95% C.I. 


[9.12, 9.91] X 10"" 


[1.48, 1.70] X 10"'" 


[2.51, 3.05] X 10"'^'' 


Time Taken (s) 


12 


53 


98 


Average no. particles 


13 


18 


19 


S.D. no. particles 


24 


37 


41 


Max no. particles 


255 


416 


477 



30 



Table 5. X = 1, fii = 3,/i2 = 2, asymptotically suboptimal scheme. 



71 


10 


20 


30 


Theoretical Value 


9.64 X 10^" 


1.60 X 10"-''' 


2.64 X 10"^" 


Estimate 


10.3 X 10"" 


1.52 X 10"-'=' 


2.32 X lO"^'' 


Std. Err. 


0.32 X 10"" 


0.08 X 10"-'=' 


0.20 X 10""^=^ 


95% C.I. 


[9.66, 10.9] X 10"" 


[1.35, 1.68] X 10"'=' 


[1.92, 2.72] X 10"^" 


Time Taken (s) 


10 


37 


67 


Average no. particles 


15 


18 


19 


S.D. no. particles 


39 


61 


72 


Max no. particles 


636 


1266 


2060 



Table 6. A = 1,/i]^ = 2,/i2 = 3, asymptotically suboptimal scheme. 

The choice of W{x) = 7 — PiXi — P2X2 as a subsolution may seem arbi- 
trary, however it turns out to be a very natural choice. Given a > con- 
sider a "nice" set B such that for the subsolution Wa{x) = a — PiXi — P2X2, 
Bn{x : Wa{x) > 0} = and B n {x : Wa{x) = 0} / 0. Then 



lim 

n— »oo 



1 



n 



logp" = a, 



where 



p" = P(o,o)(lueue reaches nB before first return to (0,0)). 

Intuitively this means that all points on a level set of the function Wa{x) 
have the same asymptotic probability. Thus given any such nice set B we 
can identify its large deviations rate by finding the unique a* such that 
Bn{x : Wa*{x) > 0} = and S n {x : Wa*{x) = 0} / 0. Further Wa*{x) 
will lead to an asymptotically optimal scheme. 

That the family of functions Wa has such a property is because the sta- 
tionary probabilities for a stable tandem Jackson network have the product 
form 7r({i,j}) = (1 ~ Pi)(l ~ P2)PiP2- Indeed, by using an argument based 
on the recurrence theorem, we can see that every stable tandem Jackson 
network has a family of affine subsolutions with the same property. Further 
this will be true for any A^-dimensional queueing network for which the sta- 
tionary probabilities vr have asymptotic product form, by which we mean 
that there exist pi, . . . , pj^ such that for any nice set B 



lim log7r(ni?) 



= infjxift H h xnPn ■ (^^i; ■ ■ ■ > ^n) G B}. 



6.3 Non-Markovian Process 

Since many models are non-Markovian we present an example of splitting 
for a non-Markovian process. Consider a tandem network whose arrival 
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and service rates are modulated by an underlying process Mt which takes 
values in the set {1,2}, such that the times taken for the modulating pro- 
cess to switch states are independent exponential random variables with 
rate 7(1) if M is in state 1 and 7(2) otherwise. Let A(l),/ii(l),/i2(l) and 
A(2),//]^(2),/Lt2(2) be the service rates of the queue in the first and second 
states respectively. It is known (see, e.g., [5]) that the Hamiltonian can be 
characterized in terms of the solution to an eigenvalue/eigenvector problem 
parameterized by p. This characterization is used for calculating the various 
roots to E[(p) = used below. 

Consider again the single shared buffer problem. Let A(l) = l,/i]^(l) = 
3.5,//2(l) = 2.5,7(1) = 0.2 and A(2) = 1,/Xi(2) = 4.5,^2(2) = 4.5,7(2) = 
0.5. Using a verification argument, one can show that W{x) = 1.00029(1 — 
xi — X2) is a subsolution with the maximal value H^(0). Thus using W{x) 
leads to an asymptotically optimal splitting scheme. The results of simula- 
tions run using this importance function are shown below, where again each 
estimate was derived using 20,000 runs. 



n 


30 


40 


50 


Theoretical Value 


6.36 X 10"^^ 


2.88 X 10"-" 


1.30 X 10""^^ 


Estimate 


6.66 X 10^^^ 


2.89 X 10^-" 


1.27 X 10^^^ 


Std. Err. 


0.23 X 10^^-' 


0.13 X 10^-" 


0.06 X 10^^^ 


95% C.I. 


[6.23,7.11] X 10--''^ 


[2.66,3.11] X IQ-^' 


[1.16, 1.38] X IQ-^^ 


Time Taken (s) 


8 


14 


20 


Average no. particles 


4 


5 


5 


S.D. no. particles 


10 


11 


13 


Max no. particles 


188 


214 


280 



Table 7. Markov-modulated network, total population overflow. 

It is also worth revisiting the separate buffers problem for the same 
queueing network. For the same arrival and service rates one can again use 
a verification argument to show that T^(a;) = 2.2771 — 1.2953x1 — 0.9818x2 
leads to an asymptotically optimal splitting scheme. Results of a simulation 
using 20,000 runs are shown below. 



n 


10 


20 


30 


Theoretical Value 


8.36 X 10""' 


1.07 X lO"-"* 


1.39 X IQ-^" 


Estimate 


8.24 X 10^^" 


1.04 X 10^ -'■^ 


1.36 X 10-^" 


Std. Err. 


0.19 X 10^^" 


0.03 X lO^-'"^ 


0.05 X 10^^'-' 


95% C.I. 


[7.85,8.62] X 10"-"' 


[0.98, 1.10] X IQ-^" 


[1.26, 1.45] X IQ-^" 


Time Taken (s) 


22 


150 


479 


Average no. particles 


31 


59 


89 


S.D. no. particles 


76 


182 


336 


Max no. particles 


1070 


3228 


7871 





Table 8. Markov-modulated network, simultaneous separate buffer 

overflow. 
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Finally we investigate what happens in this case if we use a strict sub- 
solution as importance function. The table below shows the results of a 
simulation using 20,000 runs based on the importance function 0.95 x W. 



n 


10 


20 


30 


Theoretical Value 


8.36 X 10""' 


1.07 X lO"-"* 


1.39 X IQ-^" 


Estimate 


8.28 X 10^^" 


1.06 X 10^ -'^ 


1.43 X 10^^" 


Std. Err. 


0.21 X ID"'-" 


0.04 X lO"-'"^ 


0.07 X 10"^" 


95% C.I. 


[7.87,8.69 X lO"-'" 


[0.98, 1.14| X IQ-^" 


[1.29, 1.56] X IQ-^'-' 


Time Taken (s) 


15 


70 


157 


Average no. partieles 


22 


31 


36 


S.D. no. partieles 


50 


8S 


116 


Max no. particles 


707 


1469 


3021 



Table 9. Markov-modulated network, asymptotically suboptimal scheme. 

6.4 Rare Events for the Sample Mean 

It is also worth noting that this approach works just as well for finite 
time problems. Assume that Xi,X2, ■ ■ ■ is a sequence of i.i.d. N(0,/ ) 
random variables where /^ is the A^-dimensional identity matrix and let 
^n = -^ Ym=i -^i- Suppose that we are interested in simulating the sequence 
of probabilities 



P 



P{Sn(^C} 



for some set C such that C does not include the origin. For j G {1, . . . , n} 
let Sn{j) = ^ X^i=i ^i- Then given sequences Xn, jn and x € M^, t G [0, 1] 
such that linin^oo Xn = x and lim„^oo jn/^ = t, the large deviations result 



hm — logP{5„ G C\Sn{jn) = Xn} = W{x,t) 
n— »oo 77, 

holds. Further the PDE corresponding to solutions of the calculus of varia- 
tions problem is (see [6]) 

Wt + iniU{DW;p) = 0, 

where IHI(s;/3) = (s,/3) + L{(3) and L{f3) = \\l3f/2. We can put this into 
the general framework in the standard way, i.e., by considering the time 
variable as simply another state variable. The set B, for example, is then 
C X {!}. Strictly speaking this problem does not satisfy the conditions used 
previously, since the sets A and B no longer have disjoint closure. Although 
we omit the details, it is not difficult to work around this problem. 
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It is easy to see that any affine function of the form 



W{x,t) 



{a,x) 



\a\ 



[l-t)H{a), 



where H{a) = ||a|p/2, is a subsolution, though it may not have the optimal 
value at (0, 0) and may not be less than or equal to zero on B. We can use 
the fact that the minimum of a collection of subsolutions is also a subsolution 
to build a subsolution which satisfies the boundary condition and has the 
maximal value at (0,0). For example, suppose that C = {x S M^ : {pi,x) > 
1} U {x € M^ : {p2,x) > 1} where pi = (0.6,0.8) and p2 = (0.6,-0.8). 
Let Wi{x) = 1 - {pi,x) - i(l - t), W2{x) = 1 - {p2,x) - i(l - t). Then 
W = W1AW2 is a subsolution and in fact provides an asymptotically optimal 
splitting scheme since Ty(0, 0) = VF(0, 0). Numerical results are shown 
below. Each estimate was derived using 100,000 runs. In contrast to all 
the previous examples where the process evolves on a grid, the simulated 
process in this case may cross more than one splitting threshold in a single 
discrete time step. This appears to increase the variance somewhat (at least 
if the straightforward implementation as described in Section 2 is used), 
and hence we increased the number of runs to keep the relative variances 
comparable. 



n 


20 


30 


40 


Theoretical Value 


7.75 X 10-" 


4.33 X lO"** 


2.54 X 10""' 


Estimate 


7.65 X 10"'= 


4.22 X 10"" 


2.60 X 10"^" 


Std. Err. 


0.15 X 10"'= 


0.10 X 10"" 


0.07 X 10"^" 


95% C.I. 


[7.37,7.94] X 10"" 


[4.03,4.42] X lO"'* 


[2.47,2.74] X 10""' 


Time Taken (s) 


5 


10 


18 


Average no. particles 


2 


2 


2 


S.D. no. particles 


2 


3 


3 


Max no. particles 


70 


61 


80 



Table 10. Sample mean for sums of iid. 
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